a srwi&x m 


Indian Agricultural 
Research Institute, New Delhi. 

L A. R. I. 6. 

MGIPC—SI—0 AR/54—'7-7-54—10,000. 







wnim 

IARI 

CONTENTS. 
Vol. III. —Year 1936 . 


No. 1, 1936. 

PAGES 

Application of Statistical Principles to an Industrial 
Problem. By W. J. Jennett and B. P. Dtjdding 
(Communication from the Research Laboratories of 
the General Electric Co., Ltd., Wembley, opening a 
discussion). 1—12 

Discussion on the Paper . 12—28 

Specification of Rules for Rejecting too Variable a Pro¬ 
duct, with Particular Reference to an Electric Lamp 
Problem. By B. L. Welch, B.A. 29—48 

Statistical Analysis of Field Counts of Diseased Plants. 

By W. G. Cochran, B.A. 49—67 

Square Root Transformation in Analysis of Variance. 

By M. S. Bartlett . 68—78 

Tests of Significance in Analysis of Co-variance. By J. 

Wishart, M.A., D.Sc. 79—82 

Correspondence : Complex Experiments (J. Neyman and 

F. Yates) . 83—85 

No . 2,1936. 

Inverse Interpolation and Scientific Applications of the 

National Accounting Machine. By L. J. Comrie ... 87—113 

Co-operation in Large-Scale Experiments: A Discussion 

opened by W. S. Gosset . 114—136 

iii 












IV 


CONTENTS VOL. Ill, YEAR 1936. 
No. 2, 1936.— continued. 


PAGES 

Statistical Methods Applied to the Manufacture of Spec¬ 
tacle Glasses. By C. E. Gould and W, M. Hampton, 

PhD., B.Sc., F.Inst.P., A.I.C. 137—155 

Discussion on the Paper . 156—177 

The Distribution of “ Student’s” Eatio for Non-Normal 

Samples. By E. C. Geary, M.Sc. 178—184 

Some Notes on Insecticide Tests in the Laboratory and 

in the Pield. By M. S. Bartlett . 185—194 

An Enumeration of the Confounded Arrangements in the 
2x2x2... Factorial Designs. By M. M. Bar¬ 
nard, M.A., B.Sc.(Melbourne). 195—202 

Index to vol. Ill (1936). 203—204 









SUPPLEMENT 

TO THE 

JOURNAL OF THE ROYAL STATISTICAL SOCIETY 
Vol. m.. No. 1, 1936. 


First Meeting of the Session 1935-36, 28th November, 1935. 

[Dr. E. C. Snow in the Chair.] 

The Chairman, in opening the meeting, said that before calling upon 
the speakers, he would like to give a brief account of the origin of this 
particular meeting. In the summer the Committee of the Section 
sent a letter to a number of manufacturing firms in this country, 
suggesting that if they had problems which they thought were 
amenable to statistical treatment on the lines on which the Committee 
was working, the Section would be glad if they would present them, 
so that they might be examined by statisticians, and then, if thought 
suitable, discussed by the Section. The first fruits of the appeal 
would be seen in the discussion, which would now be opened by 
Mr. Dudding and Mr. Jennett. They proposed to do this as a duet 
rather than a solo, and Mr. Dudding would be the first to address the 
meeting. 

* 

The Application op Statistical Principles to an 
Industrial Problem. 

By W. J. Jennett and B. P. Dudding. 

(Communication from the Research Laboratories of the General Electiic 
Co., Ltd., Wembley, opening a discussion.) 

Introduction. 

We feel that we owe an apology to the many experts present at 
this meeting for presuming to ask them to consider an industrial 
problem which possesses little of interest for them. 

One of us has long been convinced of the desirability—necessity 
even—of applying statistical principles to the design of experi¬ 
ments undertaken under industrial conditions. The formation of 
this special Section of the Royal Statistical Society appeared to fill 
a long-felt need, but the response of those occupied in the manu¬ 
facturing industries has not been all that could be desired. 

SUPP. VOL. III. no. 1. 


B 



2 Jennett and Dudding —The Application of [No. l| 

It may he that the zeal and oratorical skill displayed in di<4 
cushion by the expert have acted as a deterrent to those who look 
for relief in the evenings from the strenuous and often vitriolic 
atmosphere of a works. Therefore, we ask the sympathetic in¬ 
dulgence of the expert when discussing the problem we place before 
you. It is our hope that other industrial workers will follow our 
example, and that as a result of the work of this Section, the expert 
may enjoy the satisfaction of having contributed to the increased 
efficiency of British industry. 

The problem examined below was part of a much more compre¬ 
hensive study of lamp-making processes. It is presented in the form 
in which it was carried out in 1931-1933, and no attempt has been 
made to revise the treatment of the data in the light of knowledge 
acquired since the date when the work was initiated. 

The problem. 

The material concerned takes the form of fine tungsten wires, 
and the quality aimed at is the longest possible life when operating 
in a lamp at a given efficiency of light production, or inversely, the 
highest possible efficiency for a given life. 

The life test involves a period of about 1,000 hours and an 
expenditure of approximately £i per ioo watt lamp tested. 

It is obviously desirable to devise wire quality tests which do not 
involve so much time or so much expense, and particularly tests 
suitable for application at different stages in the production of the 
wire. 

A number of wires which had been subjected to a variety of 
manufacturing processes were tested over a period of rather more 
than one year. As a result of a statistical analysis of the data derived, 
it was demonstrated that a particular process led to the best results 
both with respect to working properties and the life of the wire. Also, 
that for wires made by this process, two short-period tests, designated 
A and B in this paper, appeared to be correlated with the performance 
of the wire in the lamp. It was desired to confirm these conclusions 
and to examine more closely the degree of correlation between teats 
A and B and the lamp life. 

The particular practical difficulties involved in the work are 
indicated in what follows. The minimum amount of wire required 
for complete testing is about 50 metres. This length of wire will 
have had the same history apart from inevitably small variations 
in temperature during the wire working and drawing processes. 

Tests A and B involved testing five short lengths from each 
end of the wire, the intermediate portion being coiled into a helij: 
from which the finished filaments are made. 



1936.] Statistical Principles to an Industrial Pioblem. 


3 


Among the factors which it is necessary to take into account 
throughout the experiments are :— 

(a) changes in treatment during coiling; 

(b) changes in treatment during dissolving out the mandrel on 

which the wire is coiled; 

(c) changes in the conditions during exhaust and gas-filling the 

lamp. 

The procedure adopted was to select at random eight samples 
per week from the wires being used in the factory and to subject 






<£> 





0 OAS 





OAS-* 

1 

— 

1 

£ 


x OA t 
C Code 


- 

0^ 

i 

zs I 

'NO 

( 


1 

10 0 

l 

A 

0 0 

1 

1 < 
• 

1 

1 x 

1 

’ 1 

1 






1 

fr i 


1 


1 

*8 0 

1 

°i 

1 

l 





X 

1 

1 

l Q : 

0 

1® X ■! 

1 

1-O- 


1 

1 

-*> 

■X | - 

1 

1 x 

1 

1 


A* 




1" 





1 

_ 1 









3 





* 















1 




250 260 270 280 290 300 310 320 330 340 350 

A. 

Fig. 1. 


each wire to tests A and B. This was continued until about 50 
wires had been tested, and as it was not desired to test every wire 
in lamps, the number so tested was reduced by selecting those which 
showed the widest variations in the properties determined by tests 
A and B. 

From the data of the tests on all the wires limit values for the 
two measured properties were determined within which 19 out of 20 
should fall if the quality of the material were statistically consistent. 
These limits and test results are shown plotted in Fig. 1 . Pre¬ 
liminary limits were actually calculated when about 30 wires had 
been sampled in order that coiling of the wires could be commenced 
while sampling was still in progress. The median was used as the 
representative statistic in view of the skew distribution of the 









4 


Jennett and Dudding— The Application of [No. 1, 


individual results. It may be noted here that the number of points 
lying outside the limits is greater than might be expected (x in 20 ), 
indicating lack of effective control in the manufacture of the wire. 
The correlation coefficient for the points plotted in the figure is 
0 * 26 . This is slightly below the value for the 0-05 probability 
level, which is 0 * 29 . 

The wires whose properties were outside or near the limits were 
chosen for coiling, lamp-making and life-testing. 

They were all coiled on the same machine. The same length of 
mandrel wire which had been carefully tested for uniformity of 
diameter was used throughout the test. The statistician will 
probably think here of the effect of variations in the coiling machine 
on the experiment. In this case, our method of dealing with these 


Table I. 

Data for nine wires chosen for coiling. 


Wire Ref. 

Date Stamped. 

A. 

B. 

Life. 

OAZS 238/3538 

28.4.33 

271 

278 

19 

23 

1760 

OAZS 238/3481 

12.5.33 

302 

319 

22 

23 

1545 

OAZS 238/3122 

12.5.33 

298 

300 

14 

14 

1365 

OAZS 236/1773 

12.5.33 

275 

260 

18 

20 

1795 

OAZS 229/676 

12.6.33 

326 

321 

23 

24 

1380 

OAZS 238/3199 

12.6.33 

344 

336 

16 

17 

1050 

OAS 238/812 

28.4.33 

281 

295 

13 

19 

1715 

OAS 229/1610 

23.5.33 

293 

295 

25 

25 

2140 

OAS 229/1395 

12.6.33 

319 

318 

19 

18 

1105 


was to design a special machine in which major variables were, as 
far as possible, eliminated. 

Tests were made in which this machine was compared with 
standard machines by coiling adjacent lengths of the same wire on 
each. The improved lives obtained from filaments made on the 
special machine indicated that it was of better quality than the 
standard type and hence was more likely to be suitable for this 
investigation. This is referred to again later. 

The lamp-making processes were so arranged that any variations 
arising therein were likely to influence equally each batch of lamps 
made. 

The test data of nine wires which were chosen for coiling are set 
out in Table I and are shown graphically in Kgs. 2 and 3 . It will 
be noted that two sets of figures for the tests A and B appear in the 
table, and these refer to the beginning and end of the length of wire, 
the intermediate portion of which is coiled. In the graphs the means 
of the pairs of figures shown in the table are used. The values for 





1936.] Statistical Principles to an Industrial Problem. 


5 


lamp life are the average result for a group of io lamps. Taking 
merely the nine sets of values used for plotting the points in the 
graphs, correlation coefficients were calculated as follows :— 

(a) Between A and Life the coefficient is — 0 * 78 . The 0*05 
level is 0*66 and therefore this coefficient is significant, (b) Between 
B and Life the coefficient is 0 * 43 , which is therefore not significant. 
The partial correlation coefficient between B and Life, when changes 
in A are allowed for, is o* 73 . This is a significant degree of correlation. 

We conclude, therefore, that A and B are measures of wire 
properties which are only slightly related and each of which, in its 



I 1 I I 1 1 -I- I 1 Tv I 

250 260 270 280 290 300 310 320 330 340 330 

A. 


Fig. 2. 


own way, may be a determining factor in wire quality. It is likely 
that B would be of greater value when production is controlled 
to give results more uniform with respect to A. 

The multiple correlation coefficient of life on A and B is 0 * 90 . 
A high degree of correlation is indicated, and this is readily seen from 
Mg. 4, which shows a solid diagram of the three variables. 

It will be useful to give the equation of the lines and plane of 
best fit calculated by the method of least squares. 

L = 5330 - 12-54 (± 225), 

L= 730 + 40*55 (± 325), 

L = 4500 - 12*64 + 43 B (± 160), 
where L is the life in hours. 





6 


Jennett and Dudding —The Application of [No, 1, 


Practical application of results. 

Although the above results are significant they are clearly of 
practical value only when lamp performance can be improved 
by the use of the information contained therein. Obviously an 
increase in average life would be obtained if wires could be made 
whose measured properties were in the lower range of values for test 
A and higher range of values for test B. 

In addition, however, an increase in the uniformity of lamps 


2200 

2000 

S'1800 

r~f 

Co 

11600 

IT 

§ 1400 
tel 

jS* 1200 
1000 

800 

600 

10 12 14 16 18 20 22 24 26 

B. 

Fig. 3. 

would result were variations in A and B eliminated. The contri¬ 
bution of variations in A and B to the total variation of lamp life 
was estimated as follows :— 

The equation for calculating life from A and B was used to estimate 
lives for all the wires sampled from the factory (shown in Fig. 1). 
The grand average was c. 1,500 and the standard deviation 230 
hours for the average results of batches of 10 lamps. Each average 
life for 10 lamps estimated from the equation involves a standard 
error of 160 hours. Hence V230 2 — 160 2 = 165 hours is an 


CORRELATION COEFFICIENT*- 
EQUATION FOR REGRESSION OF LI 
L* 730+40 5 B(± 325) , 

*-043 
FE ON 




0 95 P 

toBABI 

LITY L 

IMITS- 


s' 


0 













< 



X 


0 

1 

CO 

CO N 
< < 

0 0 





■ 


X 

X 







X 

\ 






X 

0 


** 

^ . 1 



. 






s> 














1936.] Statistical Primifles to an Industrial Problem. 


7 




Fto. 4. 

estimate of the standard deviation of life due to variations between 
wires only. f 

The variation within the batches of io lamps made from the 









8 


Jennett and Dudding— The Application of [No. 1, 


same wire gives an estimated standard deviation of 210 hours for 
single lamps. Brom these figures it appears that variations between 
wires due to variations in A and B contribute an appreciable pro¬ 
portion of the total variability in lamp life, and that a considerable 
reduction in lamp life variation would result were production con¬ 
trolled to give wires more uniform with respect to A and B. 

In view of the significant results obtained in the above experi¬ 
ment it was of interest to see whether the data for wires prepared 


Table II. 

Data for wires prepared by special process and tested in the 
course of normal experimental work . 


Eff. No. 

A. 

B. 

Life. 


S 226/6415A 

355 

16-4 

855 


S 226/6415B 

332 

15-4 

1015 

Treated at 

S 226/6413A 

350 

22-0 

1070 

0*9 

S 226/913A 

360 

15-0 

890 


S 226/8841 

357 

15-2 

1095 


8 226/8840 

363 

17*4 

900 

Treated at 

S 226/6413B 

356 

20-4 

905 

0*9 and 0*5 

S 226/913B 

345 

14-6 

950 


S 226/7622 

356 

17-1 

1170 

1 


S 226/C 

276 

14-2 

1605 


S 229/A 

293 

15-6 

1120 


S 229/B 

288 

16-1 

1320 

Treated at 

S 226/913C 

315 

— 

1225 

0*9, 0*5 

S 222/A 

305 

15*2 

1055 

and 0 15 

S 222/A 

315 

14-6 

1390 


S 223/12119 

306 

21*4 

1385 


S 223/12120 

286 

19*4 

1700 


S 226/7621 

289 

18*9 

2070 


S 226/7623 

296 

18*5 

1395 


S 226/6413C 

335 ! 

20*8 

1105 



by the particular process and tested in the course of normal ex¬ 
perimental work over a period of about a year conformed to the 
limits stated in the equations obtained for the nine wires which 
were specially selected and made into lamps. 

In Table II the results for this larger number of wires are given. 
It was shown from the comparisons mentioned on p. 4 that the 
machine on which these wires were coiled gave lives lower than those 
of the special machine. The lives shown in Table II have been 
increased by 8 per cent., which is the estimated difference between 
the machines, and the results plotted in Bigs. 5 and 6 together 
with those for the nine wires. 


Life m Hours. Life in Hours. 


Statistical Principles to an Industnal Problem. 










10 


Jennett and Dudding —The Application of [No. 1, 


Table IIIa. 

Wires sampled ftom Factory. Initial Reading s\ 


Ditr* 

Sample. 


W iro. 


A. 


b. 


7.4.33 


OAS 236/18072 
OAS 233/11902 
OAS 226/2039 
OAS 236/2488/2 
OAS 236/1808/2 
OAS 226/1947 
OAS 236/1885/2 
OAS 233/12090 


304, 299, 299, 301, 301 
294, 292, 289, 291, 291 
315, 315, 315, 315, 314 

281, 279, 279, 277, 280 

282, 282, 284, 279, 278 
304, 302, 304, 304, 304 
280, 280, 283, 285, 284 
300, 302, 303, 288, 302 


12, 14, 12, 12, 13 

14, 14, 17, 15, 16 
20, 19, 20, 14, 15 

15, 16, 19, 15, 21 
17, 16, 17, 20, 17 
20, 21, 21, 19, 23 

16, 19, 15, 17, 15 
24, 20, 18, 16, 18 


28.4.33 


OAZS 233/1295 
OAS 233/12113 
OAZS 236/2464 
OAZS 237/1120 
OAZS 238/3538 
OAZS 238/3722 
OAS 238/812 


309, 314, 313, 309, 310 
302, 301, 301, 302, 301 
298, 298, 299, 300. 295 
273, 272, 272, 272, 274 
272, 271, 271, 271, 271 
278, 280, 281, 282, 280 
281,281, 282, 281, 285 


19, 22, 18, 18, 26 
25, 21, 23, 21, 21 
19, 19, 21, 21, 20 
21, 18, 15, 18, 15 
19, 22, 18, 19, 20 
16, 15, 17, 16, 18 
13, 12, 13, 15, 13 


5.5.33 


OAZS 236/4086 
OAZS 238/3481 
OAS 238/3654 


300, 301, 301, 282, 301 
292, 296, 294, 299, 298 
295, 291, 292, 296, 292 


21, 18, 23, 22, 25 
18, 20, 16, 20, 17 
21 , 22 , 20 , 21 , 22 


12.5.33 


OAZS 238/3122 
OAS 229/1512 
OAZS 238/2809 
OAZS 238/2855 
OAZS 236/1773 
OAS 233/11956 
OAZS 238/3481 
OAZS 239/4088 


296, 298, 296, 299, 300 
288, 289, 292, 295, 291 
292, 291, 290, 290, 285 
305, 304, 302, 309, 305 
275, 278, 275, 274, 274 
292, 295, 290, 292, 294 
307, 302, 307, 302, 302 
306 307, 304, 307, 306 


16, 13, 14, 14, 14 
19, 17, 19, 14, 18 
28, 20, 16, 24, 20 
22, 23, 23, 20, 15 

17, 19, 19, 18, 17 
22, 18, 23, 26, 21 
22, 22, 25, 21, 25 
22 , 22 , 20 , 22 , 19 


C 

C 


c 


c 

c 


23.5.33 


OAZS 238/3165 
OAS 238/3087 
OAZS 238/2968 
OAZS 238/579 
OAZS 238/2764 
OAZS 236/1773 
OAS 229/1610 


324, 331, 326, 323, 326 
319, 324, 320, 322, 324 
.324, 318, 324, 324, 318 
300, 294, 300, 302, 303 
309, 316,313,310, 314 
265, 267, 266, 268, 266 
296, 293, 301, 294, 293 


17, 18, 16, 19, 17 
22, 20, 17, 18, 23 


17, 15, 18, 18, 18 

18, 21, 18, 21, 18 
19, 20, 19, 18, 23 
16, 17, 18, 17, 16 
28, 22, 22, 27, 25 


26.5.33 


OAZS 236/1773 
OAZS 229/676 
OAZS 238/3190 
OAZS 238/2672 
OAS 238/2800 
OAZS 238/3221 


294, 294, 290, 296, 291 
313, 314, 312, 313, 313 
310, 304, 310, 310, 310 
318, 318, 315, 316, 324 
308, 303, 299, 302, 302 
316, 317, 317, 317, 314 


17, 16, 16, 14, 16 
24, 24, 25, 22, 20 
22, 25, 22, 24, 20 
21, 19, 14, 21, 19 
19, 22, 27, 19, 19 
17, 17, 17, 14, 17 


12.6.33 


OAS 229/1610 
OAZS 238/2552 
OAZS 236/1773 
OAZS 229/676 
OAS 229/1515 
OAZS 238/3199 
OAS 233/11744 
OAS 229/1395 


295, 299, 299, 295, 299 
323, 324, 326, 333, 327 
266, 268, 270, 267, 265 
326, 327, 327, 324, 322 
299, 292, 292, 291, 289 
344, 345, 344, 341, 341 
297, 300, 297, 296, 293 
322, 319, 319, 319, 318 


19, 19, 22, 21, 21 
23, 22, 22, 22, 25 

20, 18, 20, 16, 14 
27, 23, 23, 20, 23 
18, 18, 19, 19, 17 
18, 20, 16, 16, 16 
16, 17, 22, 20, 17 
22, 19, 18, 21, 15 


29.6.33 


OAZS 238/3199 
OAS 229/1395 


322, 328, 323, 324, 323 21, 15, 16, 14, 14 

323, 318, 320, 327, 320 21, 18, 22, 22, 20 


C 

G 

C 


Wires marked “ C ” were those chosen for coiling. 



1936.] Statistical Principles to an Industrial Pioblem. 


11 


The lines of best fit and 19/20 limits are shown, and it will be 
seen that the points are statistically consistent in respect to these 
limits. In view of the fact that no particular precautions were 
taken in filament and lamp making, we feel that the results from the 
more carefully controlled experiments are not subject to any grave 
disturbance due to variations arising in filament coiling and/or 
lamp-making. 

The foregoing is a brief account of the experimental and analytical 
procedure carried out. 

We feel that the statistician will probably make suggestions 
as to how the experiment might have been better planned and that 
he will readily criticize our methods of analysis. We have always 


Table IIIb. 

Test values at other end of wire after coiling length had been taken . 


Wire No. 


A. 


B. 


OAZS 238/3538 
OAS 238/812 
OAZS 238/3122 
OAZS 236/1773 
OAZS 238/3481 
OAS 229/1610 
OAZS 229/676 
OAZS 238/3199 
OAS 229/1395 


278, 278, 278, 278, 278 
296, 294, 295, 296, 291 
300, 293, 304, 293, 303 
260, 260, 260, 265, 260 
326, 319, 319, 312, 312 
298, 291, 298, 295, 291 
309, 312, 321, 321, 322 
341, 336, 331, 337, 335 
318, 318, 315, 321, 325 


24, 20, 23, 23, 20 
17, 20, 21, 19, 18 
14, 14, 15, 14, 15 
21, 20, 30, 18, 25 

23, 23, 21, 24, 21 

25, 20, 24, 25, 25 

24, 24, 28, 28, 23 
19, 18, 17, 16, 15 
19, 18, 18, 18, 19 


Wire No. 


Indrvidual Lives. 


OAZS 238/3538 
OAS 238/812 
OAZS 238/3122 
OAZS 236/1773 
OAZS 238/3481 
OAS 229/1610 
OAZS 229/676 
OAZS 229/1395 
OAS 238/3199 


875, 1225, 1860, 1860, 1860, 1890, 1890, 2040, 2040,2070 
1280, 1450, 1530, 1670, 1705, 1830, 1930, 1990, 2030 
1240, 1270, 1280, 1310, 1410, 1415, 1420, 1440, 1480 
1610, 1610, 1680, 1680, 1700, 1770, 1840, 1980, 2020, 2040 
1245, 1300, 1350, 1500, 1540, 1550, 1610, 1660, 1790, 1910 
1910, 1960, 2040, 2070, 2140, 2170, 2210, 2260, 2300, 2320 
1030, 1190, 1230, 1440, 1450, 1450, 1460, 1480, 1530, 1560 
910, 925, 940,1020,1025,1120,1160,1230, 1280, 1440 
500, 745, 890,1100,1100,1100, 1140,1230, 1280, 1430 


had in mind that, although the values shown in the graphs, on 
which the calculations of the correlation coefficients are based, 
represent a number of individual tests, little use has been made 
of this fact. The real difficulty lies in the fact that no 
particular individual result from one test can be associated with 
an individual result from another. In tests to destruction it is 
obvious that not more than one test can be applied to any particular 
piece of wire. 





12 


[No. 1 , 


Hoping that statisticians may show ns how to make more efficient 
use of our results we give all the individual readings for the planned 
experiment here. 


Discussion on Messrs. Jennett’s and Dudding’s Paper. 

Prof. E. S. Pearson : Mr. Dudding and Mr. Jennett have asked 
for the indulgence of expert statisticians here, but I do not really 
think that they need have done that; if they will not allow us to call 
them statisticians, we see in them, nevertheless, a type of practical 
man whose keen appreciation of the value of statistical methods is an 
encouragement to us, and whose criticism of some of our zeal and 
oratory we can recognize as well deserved. 

I do not propose to make any suggestions myself as to possible 
improvement in the methods which the authors of the paper have 
used, because I think that I can contribute most usefully to the 
discussion by saying a few words on the question of regression and 
correlation, and by illustrating these points on another model I have 
here of Mr. Dudding’s data. Such a contribution may clear the 
ground a little for the statisticians who speak later, and may be a help 
to those not familiar with some of the statistical technique that will 
be introduced. 

In this paper the authors give both the coefficients of correlation 
and the regression equations; here I think all statisticians would 
agree that in this particular problem it is the straight lines and their 
slope and that of the regression plane which are important, rather 
than the coefficients of correlation. If you are dealing with two 
correlated variable quantities, which are each distributed approxi¬ 
mately in a normal distribution, the coefficient of correlation, with 
the means and standard deviations, will generally give a complete 
specification of the form of relationship, but as soon as you begin to 
select—and quite definitely in this particular problem a selection of 
certain wires was made according to the value of A and B —values 
of the correlation coefficient may be somewhat misleading. 

The statistician’s approach to the problem of regression may be 
illustrated on a diagram such as the authors’ Fig. 3 , which is repro¬ 
duced roughly in Fig. 7. y is written for L (or life) and x for the 
character B. The so-called regression straight line, 

Y x = a -f- bx .( 1 ) 

is fitted to the observations by the method of least squares. If the 
value of y is in no way associated with x, the line should be horizontal 
and b — 0. 

It is seen that any value of y may be regarded as made up of three 
parts, represented by the equation— 

y = y+(Y x -y) + (y-Y e ) . . . . (2) 

The first part, y } is common to all the observations, the second, 
11 — y> is that part which can be associated with changes in x , while 



1936.] on Messrs. Jennett's and Budding's Paper. 


13 


the third, y — Y r , is a residual portion varying in a random manner 
from observation to observation. If the variation in x could be 
controlled, e.g. if the character B of the filament could be kept con¬ 
stant, then the variation in y (or life of lamp) would be diminished, 
being represented by the part (y — Y t ) only. It will be realized, 
however, that even if 9 points were set down at random on the paper, 
it is unlikely that the straight line, ( 1 ), fitted to them would be exactly 
horizontal.* 

Consequently, it is important to determine whether in fact the 
slope of the line, i.e. the regression coefficient, has any significance. 
If, for example, tests were made 011 9 further wires, should we expect 
to obtain from the results a line of approximately the same slope, or 
might we even find the reverse and b to be negative ? 


An Interpretation oe Regression 



The significance of the regression will depend on the relative 
magnitude of the parts Y u — y and y — Y n and the statistician in 
applying his test makes use of the identity—(holding because ( 1 ) has 
been fitted by fci least squares ” f) 


= ... (3) 

* In the extreme case where only 2 points were put down at random, the 
straight line would pass through them, and therefore be almost certain to liavo 
some slope. 

t It will be seen that 

s (y - VT- = S{(r x -y) + y- Y x r = S( T. - W + 3*y - Y x f + 

2S(y.-5)(y- r„) 

and the product term vanishes since owing to the method ot fitting the regression 
fine the differences Y x — y and y — Y x are independent. 




14 


[No. 1, 


where E indicates the summation for all observations.. He compares 
the first sum of squares on the right-hand side of (3) with the second; 
the larger the ratio the more confident he feels of the reality of the 
contribution to y from x, e.g. to the influence of character B on length 
of life L. His measure of confidence is determined by the reference 
of this ratio (or its logarithm) to appropriate statistical tables. 

If now we are concerned with 3 variable characteristics instead of 
2 , we may write these as y, and x 2l or in the present problem 


Model of lamp data. 



L, A and B . The situation must now be represented in 3 dimensions, 
and the regression equation 

Y x = a + b v x x + b 2i a ? 2 .(4) 

is that of a plane fitted to the points (y l9 x v x 2 ). Fig. 8 shows a 
perspective drawing of a model * of the lamp data, similar to that 
shown in Fig. 4 above. The regression equation which I have 
calculated (differing slightly from that of Dudding and Jennett) is 

L = 4034 - 11-15 A + 44-41 B .( 5 ) 

If it is rewritten in the form 

L = 754 + 11*15 (350 - A) + 44-41 (B - 14) . (6) 

* lam much indebted to my draughtsman, Mr. J. G. Lee, for the construc¬ 
tion of this very pretty model and for the drawing. 



1936.] 


on Messrs. Jennet?s and Budding's Paper. 


15 


we see how the model is constructed and the interpretation of its 
parts. The plane rests on one series of triangles, which in turn stand 
across another lower series. Thus the life of a lamp may be regarded 
as built up of four portions. 

(i) A base value of about 750 hours.* 

(ii) A contribution 11*15 (350 — A) due to the A factor, equalling 

zero when A = 350 , but increasing as A decreases. 

(iii) A contribution 44*41 (B — 14 ) due to the B factor, equalling 

zero when B = 14 and increasing with B. 

(iv) A residual deviation represented by the perpendicular distance 

between the centre of the ball representing the observation 
and the plane, positive when the ball is above the plane 
and negative when it is below. 

The two triangles seen in the model show the average effect on 
L of, (1) a unit increase in A when B is constant, and ( 2 ) a unit increase 
in B when A is constant. The quantities — 11*15 and 44*41 are 
termed the partial regression coefficients. 

Again, it is important to know whether the slope of the regression 
plane is real (in the sense discussed above when dealing with the 
regression line), seeing that only 9 sets of observations ( L , A , B) 
have been used in determining it; the statistician judges the signi¬ 
ficance by comparing a sum of squares based on the contributions 
(ii) and (iii) above, with the sum of squares of the residual portions 
(iv). Similarly, by another comparison of sums of squares he may 
test separately the significance of the two partial regression co¬ 
efficients, i.e. he may ask whether, if further experiments were 
carried out, the two triangles shown in the model, which tilt the 
regression plane from the horizontal, would be of approximately the 
same slope. 

The methods of analysis of variance, which will be referred to 
later in the discussion, consist in an arrangement of the data in the 
most convenient form for the comparison of the appropriate sums of 
squares. I hope that for some of those who are not experts, the 
geometrical analysis illustrated in this model may give an added 
reality to the arithmetic procedure, by providing a physical picture 
of the situation. 

There is one final point that I would like to emphasize; that is, 
the difference between, (1) the total regression coefficients of L on A 
or L on B that are determined by considering each pair of characters 
separately (e.g. in the lamp example, the constants — 12*5 and 40*5 
of p. 5), and ( 2 ) the partial regression coefficient of L on A for B 
constant and L on B for A constant (— 11*15 and 44*41 in the lamp 
example), obtained from using the whole data to calculate the 
regression plane. From the practical point of view, conclusions 
which are seriously misleading regarding the relative importance of 

* The base value is arbitrary. It would be better taken at the mean or 
1,540, but that would have introduced difficulty in the construction of the 
model. 



Discussion 


16 


[No. 1, 


factors A and B in controlling L might be drawn from considering the 
former coefficients only. 

To make this point clear, I have taken the example illustrated in 
Tig. 9,* which shows in diagrammatic form (in a plan and one eleva¬ 
tion) a model similar to that of Tig. 8. The elevation represents the 
correlation distribution of two characters, L and A ; there is certainly 
a significant correlation between them, and the regression line, if 
calculated, would show clearly how on the average L increased with 
A . But if we were to conclude from this that in order to reduce the 


Ass Illustration of the Meaning of Partial Regression. 




Fig. 9. 

variation in L our best course would be to control the variation in 
character A, we should be wrong. As seen in the plan, there is a 
considerable correlation between A and B , and the shading of the 
circles shows that for a fixed value of B there is no clear increase or 
decrease in L with A. In fact, the partial regression of L on A for B 
fixed is zero, or at any rate, very small. The proper course, therefore, 
would be to concentrate, if possible, on the control of the B, and not 
the A character. 

In the lamp example the situation is less extreme, but the cal¬ 
culation of the regression plane with its partial regression coefficients 

* The data are different from those of Dudding and Jennett. 





1936.] on Messrs . Jennett's and Pudding's Paper. 17 

does even here show that the B character is, compared with A , of less 
importance in influencing life than a study of Figs. 2 and 3 alone might 
suggest. 

Mr. F. Yates : I should like to begin by adding my confirmation 
to what Professor Pearson has already said : although Mr. Dudding, 
in the paper he presented us, stated that he was no statistician, I 
think the paper itself contains sufficient evidence that he is in fact 
quite a competent statistician, and has extracted from these data 
almost all the information that it is possible to extract. The only 
comment that can be justly made is that there is a certain indirect¬ 
ness of approach. Where the experienced statistician might run 
straight in, because he had done the job before, the authors of this 
paper work round until they approach the centre point, but they 
arrive there all right. 

I will take up the tale where Professor Pearson left off. He has 
given you a diagrammatic demonstration of the various components 
into which the variability may be resolved. I propose to set out the 
arithmetic of the process, appropriately known as the analysis of 
vai lance , and to discuss its interpretation in this particular case. 

The actual analysis of variance is as follows : 

Analysis of Variance 


(in units of meanb of 10 lamps). 




D.r. 

b.s. 1 

M.S. 


f Regression . 

2 

799,700 

399,800 

Between 

1 Remainder . 

6 

200,400 

33,400 

Wires 

^ Total . 

8 

1,000,100 

125,000 

Within wires . 

79 

401,900 

3,100 


The total sum of squares in this table represents the sum of the 
squares of the deviations of the mean lamp-life values from their 
general mean: these deviations will be the distances on the model of 
the points from a horizontal plane situated at a distance above the 
base equal to the mean lamp-life. 

The remainder sum of squares represents the sum of squares of 
the deviations (or vertical distances on the model) from the regression 
plane. 

The sum of squares due to the regression is the difference of these 
two sums of squares, and is also equal to the sum of squares of the 
vertical distances (for all the points) between the regression plane 
and the mean plane. This regression sum of squares can be quickly 
calculated, and in practice the remainder sum of squares is derived 
by subtraction. 

Clearly the magnitude of the total sum of squares depends on the 
number of values that are included, as well as on the magnitude of 
the variability affecting each. Actually with constant variability 
the magnitude is proportional to one less than the number of values. 
This number (here 8) is called the number of degtees of freedom, and 



18 


Discussion 


[No. 1, 


indicates the number of independent deviations included in the sum 
of squares. Since the deviations are here measured from the mean 
of the values, only 8 of the 9 are independent, the sum of all 9 being 

necessarily zero. . 

In a similar manner there are 2 degrees of freedom associated with 
the regression sum of squares, since the regression involves two 
constants (other than the mean) which are determined from the data. 
This leaves 6 degrees of freedom for the remainder sum of squares. 

Dividing the sums of squares by the appropriate degrees of 
freedom gives the mean square column. The mean squares provide 
estimates (as illustrated in more detail later) of the variances ( i.e . the 
squares of the standard errors) of the various components of the vari¬ 
ability to which the material is subject. If the variability of the 
material were entirely due to random causes, all three mean squares 
would on the average be equal. Here the remainder mean square 
is very much smaller than the regression mean square, indicating 
that the regression is a real one. 

Naturally with small numbers of degrees of freedom the variations 
in the mean squares due to chance causes are large, and consequently 
an exact test is required to determine whether in any particular case 
an observed difference may be regarded as real. This is provided by 
the z test. 

The last line of the table, <L within wires ”, is calculated from the 
values of the authors’ Table IIIb, being based on the deviations of 
each lamp-life from the mean of that particular wire. The sum of 
squares of these deviations is divided by 10, because we are working in 
units of the mean of 10 of these values. Although the regression 
has accounted for a'good deal of the variability, the residual vari¬ 
ability 33,400 (6.D.F.) is significantly greater than the variability 
within wires, indicating that the regression does not account for the 
whole of the differences between wires. The observations are a little 
scanty to attempt to locate the causes of this residual variability, but 
it is worth noting that the fitting of separate regressions to the OAS 
and OAZS groups accounts for the whole of it, the regression co¬ 
efficients on factor A being significantly different for these two groups. 
These differences might be worth further experimental investigation. 

I cannot quite follow the authors where they calculate the reduc¬ 
tion in variation of lamp-life due to the control of factors A and B. 
If the sample of wires were a random one, we could proceed as follows, 
using the results of the analysis of variance already set out. 


Estimate of 

Without 
( Oonfc o 1 . 

W.th 

Control. 

Variance of means of 10 lamps . 

125,000 

33,400 

le&a part dne to variation within wires . 

5,100 

5,100 

Variance due to differences between wires only 
Variance of single lamp due to variation within 

119,900 

28,300 

wires. 

51,000 

51,000 

Total variance of single lamp. 

170,900 

79,300 





1936.] on Messrs. Jennett's and Budding’s Paper . 1\ 

Thus control lias reduced the estimated variance to just about one 
half (and the standard error to 1/V2) of its original value. By “ total 
variance of a single lamp ” is meant the variance that would be 
observed if a number of lamps from several different wires were 
included in a batch. 

In this case the wires were selected, and therefore the variances of 
the first line in the above table are inflated. There is no way of 
determining the true value of the second of these, but assuming that 
it is 33,400, the first may be determined by adding this to the variance 
of the values of L determined from the regression for all the wires 
sampled from the factory (those of Fig. 1). For this the authors 
obtain the value of 230 2 or 52,900. The total is therefore 86,300, 
instead of 125,000. The total variance of a single lamp is therefore 
132,100 (S.D. ± 363 hrs.), and is reduced to 79,300 (S.D. ± 282 hrs.) 
by complete control of factors A and B* 

The other point I want to make about the data is that the 
authors could have obtained the same results with little or no 
special experiment. Let us turn to Table II. This represents data 
collected in the ordinary course of factory work. Regressions can 
be fitted in just the same manner as to the experimental data. I 
have done this to the last group of data (the only group that exhibits 
any variation in factor A). The resultant equation is : 

L = 4346 - 12*7 A + 49-4 B 
±5-50 ±35-4 

with a residual mean square of 68,500. This is almost identica 
with the experimental results : 

L = 4,057 - 11*2 A + 44*4 B 
±2*74 ±17*4 

only the coefficients are determined with considerably less precision. 
Extra precision could have been attained by further observations on 
the same lines. 

This illustrates two points : it is usually much more economic to 
conduct research of this iype by observing factory material which has 
to bo observed anyhow; and the results, when obtained, arc really 
more what is wanted, since there is no certainty that in special 
experiments conditions are really comparable with those of manu¬ 
facture. The authors are well aware of this themselves; they have 
said they were not happy that their conclusions were true until they 
had examined the actual factory results and found that they confirmed 
the equation obtained from the experimental results. In other words, 
having taken great trouble to eliminate all sources of variation, they 
then tended to regard their results with suspicion just because of that 
extra trouble, and sought confirmation from the ordinary factory 
tests—a very sensible attitude. 

This possibility of conducting observations and experiments 
within the framework of the ordinary manufacturing processes is 

* The calculations contained in this paragraph were not presented at the 
meeting. 



j50 


Discussion 


[No. 1, 


not, I think, sufficiently appreciated. The advantages of special 
experiments are that conditions can he much more widely varied, 
controlled and simplified. But the crucial test is that of practice. 
Much of the discredit that has b$en earned by laboratory science is 
due to the neglect of proper practical tests. 

Returning to our regression equations, it is interesting to see how 
the first two groups of Table II fit these equations. The actual mean 
values are: 


Group. 

A. 

£. 

L. 

Prelieted L 

Treated at 0*9 . 

349 

17*2 

958 

761 

Treated at 0*9 and 0*5 . 

355 

16*9 

1004 

673 


The predicted mean values of lamp-life are decidedly lower than 
the observed values. This indicates the danger of exterpolation from 
regression equations. 

* Substantially, therefore, as I said in my opening remarks, my 
conclusions are at one with the authors' except as regards the 
effect of control in reducing the variability. I see, however, that 
they hoped for much more from the statistician. On the last page of 
their address they say, “ We have always had. in mind that, although 
the values shown in the graphs, on which the calculation of the 
correlation coefficients is based, represent a number of individual 
tests, little use has been made of this fact. The real difficulty lies in 
the fact that no particular individual result from one test can be 
associated with an individual result from another. In tests to 
destruction it is obvious that not more than one test can be applied 
to any particular piece of wire. 

“ Hoping that statisticians may show us how to make more 
efficient use of our results, we give all the individual readings for the 
planned experiment here.” 

These remarks indicate a very common frame of mind, especially 
amongst those new to handling variable experimental material 
(particularly biological material). In such experiments, there is 
usually a very large mass of numerical data. The primary job of the 
statistician is to extract the relevant information from these data, and 
to discard what is not relevant to the point or points at issue. Let me 
quote a remark of Fisher's on this point: 

“ The statistician must be treated less like a conjurer whose 
business it is to exceed expectation, than as a chemist who undertakes 
to assay how much of value the material submitted to him contains.” 

It is here that the concepts of efficient and sufficient statistics are 
of importance. An efficient statistic is one which, in large samples, 
has the smallest possible standard error. In small samples all 
efficient statistics are not equivalent, but in certain cases there exists 
a sufficient statistic which contains the whole of the relevant informa¬ 
tion. Thus in the case of a sample from a normal dis tribution the 

* The remainder of this contribution, though prepared before the meeting, 
was not then delivered. In adding it here I must plead the invitation of Mr. 
Dudding and others to amplify and extend remarks at the meeting. 





1936 .] 


on Messrs. Jeimett's and, Budding's Paper. 


21 


mean of the sample is a sufficient estimate of the mean of the distri¬ 
bution, and if we are only interested in the mean, the sample values 
may he immediately discarded in favour of their mean. 

The separate observations on one wire for factors A and B in 
the authors 9 material are probably not distributed normally, and 
consequently the mean is not necessarily the best statistic, even for 
estimating the mean. It is not, however, likely to be a bad statistic 
for the purpose in hand. But the actual choice of a statistic is 
probably of little practical importance. Mr. Dudding and Mr. Jennett 
have chosen to use the median. The median and the mean could 
he compared by their success for the purpose for which they are to 
be used—namely, lamp-life. But it would probably require a very 
much larger collection of data than is at present available to decide 
between them. 

It may be that variation in the factor A or B within a wire may 
affect average lamp-life. To test this the variance or range or any 
other suitable measure of variation of each set of five measurements 
could be taken, and included as a separate variable in the partial 
regressions. If the variation in the test A or B within a wire was 
entirely due to experimental errors, and not to variation in the factor, 
then no regression would result. If it represented in whole or in 
part a real property of the wire, then a regression would or would not 
result, according as this property did or did not affect life. The 
dependence of the variability in lamp-life on factors A and B or their 
variability could be investigated in a similar manner. A very much 
larger amount of material would, however, be required to test points 
such as these. 

The only other information that is likely to be contained in the 
separate values of A and jB, or of lamp-life within a wire (Table III), 
is information on the distribution of these quantities, which in turn 
depends on experimental errors and on local variability of the wire. 
These points are not at issue at the moment, and until definite ques¬ 
tions are formulated it is impossible to say if the material is likely to 
provide any relevant information. 

We may therefore conclude that as far as the authors 5 immediate 
practical problem is concerned, the statistics they have used are 
likely to be quite adequate. It only remains to note that any 
actual inadequacy would not lead them astray (though relevant 
information would be wasted), since their regression equations are 
based on the statistics they use, and will make due allowance for 
errors in these, as well as for experimental errors. 

Mr. Welch said the point he wanted to consider was not concerned 
directly with the experiment that had been described that evening, 
but rather with what ought to be done when the result of the experi¬ 
ment was known. Two things had been demonstrated: (1) that if 
wires could be made more uniform in respect of the characters A and 
B , then the lamps would be more uniform in life, and (2) if wires could 
be made which had lower values of the character A , and higher values 
of the character B , than they had at present, the length of lamp life 
would be improved. The problem now to be faced was really a works 



22 


Discussion 


[No. 1, 


problem as distinct from a research laboratory problem, and what was 
wanted was to improve the quality of the wire in the respects which 
the experiments had indicated as desirable. 

Mr. Welch was sure that there were people present who had had 
to deal with this side of the problem, and he thought it would be 
extremely interesting if they could tell the meeting whether any steps 
they were taking in the factory were such as to produce these desirable 
improvements in wire quality. 

He proposed to confine himself to the results given in Table IIIa. 
There were 49 wires with corresponding sets of values of the qualities 
A and B, and from these 49, 9 were chosen to make up into lamps for 
the experiment. When the authors were considering the problem of 



Fig. 10. 

which wires to choose, they came to the conclusion that the wire 
qualities were not under control in the statistical sense, and he wanted 
to go further into this point. He wished to employ the data to 
illustrate the use of what had been termed control charts. Such 
charts had been described previously to the Society, and they con¬ 
sisted essentially of plotting values at intervals of time or space. 

Mr. Welch showed a diagram (Fig. 10) in which were plotted the 
characters A and B for the 49 wires. A division was made into the 
8 different times on which the wires were sampled from the factory. 
Starting with the character B ; if the manufacture of wire were under 
statistical control, then after about 30 test values were obtained, it 
would be possible to estimate the mean and standard deviation of 
the population from which these wires had been drawn. The first 
four times of sampling taken together provided 26 observations, and 






















1936 .] 


on •Messrs . Jennet?$ and Budding’s Paper. 


23 


these had been used to obtain such estimates. The mean had been 
put in on the diagram and also limits outside which one would 
expect i in 20 observations to lie. These limits were based on normal 
theory, this having often been found applicable when characters were 
under statistical control. It would be seen that for the first 26 
observations there was only one outside the limits, and that was quite 
in accordance with theory. When one came on to the later period, 
keeping the same limits, out of the next 23 observations, again only 
one was found outside, and this was no more than one would expect. 
Therefore as far as character B was concerned, the data demonstrated 
that there was nothing to show that the product was not under 
statistical control. 

Another question was that, having got so far, one wanted to 
increase the average value of B , because that had been shown to give 
longer life of lamps. In order to do that it was necessary to consider 
wires which gave extreme values of B. For if causes which tend to 
produce such extremes could be discovered, the knowledge could be 
used to improve the general level of the product. 

Character A had been considered in the same way as character B. 
From the first 26 observations the mean and standard deviation had 
been estimated, and limits drawn beyond which 1 in 20 observations 
would be expected to lie. In addition, wider limits were drawn for 
which the expectation outside was 1 in 100. As before, there was 
nothing in the first period to indicate lack of control. In considering 
the samples taken on the 23 rd May, however, it appeared that the 
level of control was not being maintained. The values were on the 
whole larger and the variability greater. This might possibly be due 
to default in the method of sampling, but this was unlikely, as we were 
told that the sampling was random, and, furthermore, at the later 
dates these features were reproduced. For the wires taken on 26 th 
May, out of 6 values, there were 5 above the mean. 

Mr. Tippett thought it was very sporting of Mr. Dudding and 
Mr. Jennett to put their experiments before the Section, and to allow 
the statistical searchlight to be brought upon them, but he did not 
think they need have any misgivings, as most of those present had 
a real appreciation of the true worth of the work they had done. At 
the same time, he felt he would make the most use of his time if he 
paid little attention to points of agreement and centred upon one or 
two small points of difference. 

He would like first to turn to the remarks on p. 6 on the practical 
application of results. That was an important stage, because one 
could not expect the director or works manager to appreciate correla¬ 
tion coefficients, significances and so on; the value of the experiment 
would be largely lost unless the results were put in a form which the 
practical man could understand. He thought that ultimately the 
results were reducible to these measures of standard deviation 
referred to. He did not quite agree that the best method of pres¬ 
entation was to state that there was a standard deviation of 165 
hours due to variations between wires alone, and of 210 hours due to 
variation within a wire, because the person told this would be apt to 



24 


Discussion 


[No. 1, 


think there was a total standard deviation of (165 + 210). Standard 
deviations were not additive, and it was necessary to go back to the 
squares of the standard deviations—the variances. That was what 
Mr. Yates had done where he had produced means of squares; Mr. 
Tippett would have gone further, and extracted the square roots. 
In the analysis given by Mr. Yates, however, the component variances 
had been obtained under artificial experimental conditions, and 
modified estimates would have to be used before applying the results 
to factory conditions. He did not propose to show exactly how he 
would do it, but he would put different values of mean squares from 
Mr. Yates. 

In the third paragraph on p. 6, the standard deviation for the 
average results of batches of 10 lamps predicted from the regression 
equation was given as 230 hours. Mr. Tippett did not think that 
needed correction for standard deviation within a batch, as had been 
done in the paper. 

Another point to which he would like to refer was the efficiency 
of the design of the experiments. He felt that 9 wires were rather 
few, and he began to wonder if by doing fewer tests on each wire it 
would have been possible to have had more wires and to have increased 
the statistical significance of the results. The general question of 
the efficiency of the experimentation then arose. Agriculturists had 
helped a great deal, but there was one element that had to be dealt 
with which they did not meet—the different times taken by the 
various tests. Here there were three directions in which the experi- 
mentor had to distribute his resources : (1) the tests of factors A and 
B on the wire, (2) the number of lamps per wire that he would 
actually coil, and ( 3 ) the number of lamps per wire that he would 
complete and test; and to decide the optimum distribution of 
resources there were three factors to be considered : 

( 1 ) The relative expense involved in doing a single test of each 
kind. If it was comparatively easy and cheap to do a test of factors 
A and B, it would pay to do a lot; if, on the other hand, it was 
expensive to test the lamps, then one would want to cut down the 
number of lamps tested. 

(2) The degree of variation between duplicate tests of each kind. 
The lamp life varied a lot, and although it was expensive to do the 
tests, several tests per wire were necessary to get sufficiently accurate 
results. 

( 3 ) In the paper it was suggested that it was necessary to coil 
50 m. of wire. This was a technical factor, but Mr. Tippett took it 
that this was a minimum, and that it was impossible to coil less than 
ten lamps per wire. 

Subject to technical limitation, the first two factors, expense 
involved in the tests and variations in the tests, could be balanced up 
and an optimum distribution of resources could be worked out. On a 
rough calculation it would appear that if 3 lamps were tested for life 
for every wire, the correlation coefficients would be reduced from 
“ 0*78 and + 0*43 to — 0*73 and + 0*40. By reducing the number 
of lamps tested for life, there was very little reduction in correlation 
coefficient. If, to balance that, the number of wires could be doubled, 



1936.] 


on Messrs. JenneWs and Budding's Paper. 


25 


the loss in correlation would be more than made up by the increased 
number of points, and the significance of the results would be 
increased. 

Actually Mr. Tippett thought it would be well if investigators 
would consider the advisability of doing these things on a commercial 
scale in the factory, because it was easy to get a large number of 
observations with comparatively little expense, and what was lacking 
in accuracy of individual points was made up in quantity. The 
physicist always wanted every point to be accurately determined; 
the statistician wanted lots of points, and did not bother so much 
about the accuracy of individual points. In designing an efficient 
experiment, a balance had to be reached and both sides of the question 
to be borne in mind. 

[Mr. Tippett later added the following : 

Mr. Yates has been good enough to show me the revised manuscript 
of his remarks, and he has covered the points I raised about the 
calculation of variances for factory conditions. I think the result 
given by him—that the standard deviation of life of lamps with factors 
A and B uncontrolled is 363 hours and with them controlled is 282 
hours—can be appreciated by a works manager. Actually, this state¬ 
ment does not quite apply to factory conditions because the wires 
were coiled under special conditions for the experiment, but similar 
calculations could be made on data obtained from a factory.] 

Dr. Neyman complimented the authors on their very interesting 
work, and pointed out that once the regression of the length of life 
of the lamps on the two items A and B proved significant, there was, 
possibly, a not very expensive way of limiting the variability in 
the length of life. 

The obvious way was to check the variability of both factors 
A and B. This, however, might be expensive, especially if one or 
both these factors were some characters of the raw material. 

The cheaper method was to adjust in a certain manner the value 
of one factor, say A, to the value of the other, B , which might vary 
within the usual limits. If the regression equation of L on A and B , 

L(A,B)=:L+a{A-A) + b(B~B) . . ( 1 ) 

then the method of adjustment is obtained by writing 

Z+a(A-A) + b(B-B) = C = const. . . (2) 

where G is any convenient number, e.g. G = L. 

In cases where any adjustment is possible at all, e.g . if A and B } 
are characters of two different kinds of raw material, or if one of 
them, say A , is some controllable factor in the process of manufacture, 
such as temperature or pressure, we could solve the equation (2) with 
regard to A and get 


1 



26 


RTo. 1, 


where the right-hand side represents the value of A such that by the 
given value of B , the corresponding average value of L will be exactly 
equal to C. Thus if B is varying in an uncontrolled manner, and if, 
having ascertained the value possessed by B in any particular case, 
we are able to impose on A the value calculated from ( 3 ), we shall 
reduce the variation in L by the quantity due to the variability of 
both A and B. 

Of course in many cases this method may be impracticable, but 
it may be useful in others. It is discussed in some detail in a recent 
article.* 

Mb. M. S. Babtlett said he would like to add his congratulations 
to those offered to the authors for the sound statistical sense they had 
put into the paper. He did not think he had much in the way of 
criticism to add to what had already been said—in fact, when earlier 
in the paper the authors mentioned that they had been obliged to 
take as few as nine values in order to get their regression equation, 
they certainly showed common sense in definitely selecting them and 
taking them over as wide a range as possible. 

On looking at the diagram, he thought that though they had 
rightly taken them round the outskirts, they might possibly have 
included one or two in the middle, simply to make sure that the plane 
was more or less uniform throughout the whole range. Although the 
regression equation could be obtained by selecting in that way, the 
authors also realized that when the importance of the variability 
present was considered afterwards, it was necessary to realize the 
effect of selection, and come back to the natural variation to be 
expected. 

Mr. Bartlett said that, like one or two previous speakers, he did 
not agree with the actual formula on p. 6. The authors seemed to 
have neglected the variation within wires, but even so he did not 
follow the — i6o 2 in the square root. 

Finally, he wished to say a word in general on questions of in¬ 
vestigations into variability. He was himself connected with work in 
agricultural and biological experimentation, and there it had been 
found that the idea of investigating different sources of variation 
that arise—that is, of considering the variation due to different 
factors, tracking down the important ones and applying corrections 
for others—was extremely important. There was the same problem 
in industry in tracking down the different sources of variation and 
trying to control those which were most important, but it was neces¬ 
sary to go one further in industry in order to get practical results. In 
biology it could be said that if certain values, different from what 
were actually recorded, were obtained, the results would be so and so. 
Buch a conclusion might be made in industry, but there it was 
essential to consider whether the variability could actually be 
controlled. That was, however, a problem for the engineer, not the 
statistician. 

* J. Neyman, “ Problems of Chemical Engineering requiring the Application 
of Statistical Methods,” Polish Forestr, Agric . Journal , VoL XXXIII, 1934. 



1936 .] 


on Messrs . Jennett" s and Budding's Paper . 


27 


The Chairman said that before calling upon Mr. Dudding and 
Mr. Jennett to reply, lie would like to express the most sincere thanks 
of the Section to tlie partnership for stimulating an admirable discus¬ 
sion. The paper had certainly fulfilled the hopes which the Com¬ 
mittee had when the circular was sent out to various manufacturers 
in the summer, and he would like to ask any others present who were 
in a position to produce problems, if they would like to put them up for 
discussion at subsequent meetings, to let the Secretary have an 
account of them; they could then be discussed with Prof. Pearson 
and Mr. Yates, and it might be possible to stage a similar evening. 

It was usual to allow the speakers to reply to the discussion in 
writing, but it would be quite in order to close the proceedings with 
a duet by Messrs. Dudding and Jennett; if, however, the authors 
preferred to reserve their remarks for the Supplement , it was open to 
them to do so. 

He asked the meeting to accord a hearty vote of thanks to Mr. 
Dudding and Mr. Jennett. 

The Authors 5 reply (amplified in writing) was as follows : 

We should like to preface our reply by expressing our indebtedness 
to those who have discussed the data contained in the paper. 

We should have been more pleased if a larger number of industrial 
workers were present, but perhaps when the discussion is published 
it will stimulate greater interest in further discussions of the same 
type. 

There have been a number of points dealt with which are 
precisely those which will arise when any worker attempts to apply 
statistical methods to industrial problems. 

For example, attention has been repeatedly drawn to a mistake 
made in our attempt to estimate how much the variability of the 
product will be reduced by controlling factors A and B. 

We assumed that the figure 230 2 obtained by calculating, from 
the regression equation, the lives of the 49 wires tested for properties 
A and B included the residual variance, whereas it was actually the 
regression variance. 

Professor Pearson’s Figure 7 makes it particularly easy to appre¬ 
ciate the real meaning of the calculated figure because it was clear 
that we were calculating 

S(Y* — y ) 2 and not 2(2/ — y ) 2 * 

Further, in addition to showing how clearly the analysis of 
variance table illustrates the important features of the problem, Mr. 
Yates, following a suggestion made by Mr. Tippett, has amplified 
his remarks to show how the analysis of variance can be utilized to 
estimate correctly the degree of improvement we sought to 
determine. 

Mr. Yates takes a little licence (a justifiable licence in the interest 
of discussion) in making his point concerning the possibility of getting 
the data we required from ordinary factory procedure. 

Whilst we wholeheartedly agree that analysis of factory data is 
one of the most suitable fields for the application of statistical 



28 Discussion on Messrs. Jennet?$ and Budding's Paper. [No. 1 , 


methods, we wish to point out that the data of Table II are not 
factory data. They are described as arising from normal experi¬ 
mental work, to distinguish from the special experiment. 

The actual facts are not relevant to the discussion, but the special 
experiment was necessary because :— 

(i) the factory executive would not have accepted the 
conclusions drawn from these data given in Table II because they 
did not concern factory material, 

(ii) as the work referred to the properties of wire treated in 
a special manner, it was desirable to reduce the possibility of 
uncontrolled variation in the subsequent processes, filament 
coiling and lamp making, from obscuring the effect the wire 
treatment has on lamp quality. 

Mr. Yates’s point, that the experiment could have been done in 
the works without the special conditions of the experiment, is demon¬ 
strated by the result, but was unknown at the time the work was 
commenced. 

A similar comment applies to Mr. Tippett’s reference to the 
number of lamps tested. 

Here, however, an increase in the number of wires tested would 
increase the time and cost of the work proportionally, and a reduction 
in the number of lamps tested would effect a saving on the cost of 
the power used for life testing the lamps. In this case the former 
was the more important consideration. 

Mr. "Welch has taken the opportunity the data present of 
illustrating how a control chart could be drawn which would keep the 
works personnel informed as to the constancy of the properties of 
the wires. Although nobody referred to this in the discussion, we 
know that the method is adopted in regard to property A. 

Dr. Neyman emphasizes an important principle that should not 
be overlooked when considering methods of reducing variability. 
We have applied the principle in other connections, but it is not 
applicable to this particular problem. 

In conclusion, we are not quite sure if Mr. Yates thinks we 
attribute conjurors’ tactics rather than chemists’ to the statistician, 
but we are satisfied that our appeal was rightly made because Mr. 
Yates, together with other contributors, has certainly responded 
to the appeal royally. 



1936] 


29 


The Specification op Rules for Rejecting too Variable a 
Product, with Particular Reference to an Electric 
Lamp Problem. 

By B. L. Welch, B.A. 

(Department of Applied Statistics, University College, London.) 

1. Introduction. 

The present paper is based on data * and unpublished work which 
were used to guide those interested in controlling the quality of 
electric lamps when drawing up clauses specifying the manner in 
which sample lamps should be collected on the market and tested to 
furnish certain required information about the total product which 
these samples represent. The authors of this work drew upon 
factory routine test data accumulated over a period of rather more 
than a year, and in their report were able to show how statistical 
theory could usefully be applied in problems of this type. The 
principles involved are of such wide application that it seems useful 
to consider the matter further, giving first a general discussion of a 
situation which is rather simpler than the one they had to consider, 
and then treating their particular problem in a way which is some¬ 
what more rigorous from the statistical standpoint than the one they 
followed. 

Many processes in industry have as their object the production 
on a large scale of articles which shall be alike in certain essential 
characteristics. When such a characteristic is measurable, we may 
specify some objective value of it which is most desirable and at 
which the production process should aim. The product, however, 
will never be absolutely uniform, but will vary a certain amount about 
an average value. It must be recognized that this is inevitable, 
however efficient the control of production. It is not therefore 
sufficient to specify only average values, but it is necessary also 
to indicate the permissible variation about such averages. There 
#re, in fact, situations where the primary consideration must be 
;iven to the variability, and average values take a second place. 
Such might be the case, for instance, in the manufacture on a large 
♦scale of a component part of a certain article. The design of the 

* The material was compiled by Messrs. B. P. Dudding and W. J. Jennett 
of the Research Laboratories of the General Electric Company. The thanks 
of the author are due to them for placing it at his disposal and for their advice 
and criticism during the preparation of this paper. 



30 


Welch —The Specification of Rules for 


[No. 1 


article may depend on the value of some characteristic of the com 
ponent and different values of this characteristic may necessitate 
small adjustments in design if the finished article is to be efficient 
Within limits it may not matter what the average value of tl 
characteristic is, as it may be no more difficult to design for one 
value than for another. However, having decided on the design, 
we must safeguard ourselves against having to make changes, or, 
alternatively, making a less efficient article. Thus the components 
must be made as uniform as possible. 

Even where the actual average value of a characteristic is impor¬ 
tant, it may still be wise to look first at the variability. If a producer 
can find causes which tend to make his product too variable, he will 
usually at the same time have obtained information enabling him 
to improve his general level. There is, then, sufficient justification 
for considering this question of variability at some length. 

2 . The Standard Deviation Distribution. 

The problem that will be considered here is as follows. Hot 
shall we draw a sample and deduce some measure from it on the basi* 
of which we shall either accept a product as being sufficiently urn 1 
form, or else reject it because the variation is excessive? Thi 
problem may arise at various stages; a producer may wish to control 
the product of a certain machine so that the variability shall not 
exceed a certain level; a consumer may wish to assess the variability 
in a consignment of articles which is to be delivered to him; or a 
standardizing authority may wish to obtain information about the 
complete supply of a certain product which is on the market at a 
particular time. For the time being no reference will be made to 
these specific situations. The question will be considered more 
abstractly, and we shall suppose simply that there is some large 
“ population ” * of articles from which we can draw random 
samples, and we shall be concerned with the implication of different 
rules which may be set up. The assumption will be made that the 
distribution of values in the population follows the Normal law. 
It can then be specified completely by its mean (X) and standard 
deviation (o), and the values of these parameters may be said td 
determine the level of control.f In particular, a determines thy 
variability about the mean, and we require to draw certain inference 
about its value from our samples. It has been shown that the b< r 

* This general statistical term may be used to signify any large collection 
of articles making up a batch, consignment, output during a given period, etc. 

t The terms used here follow those given by E. S. Pearson in the Bnti&l 
Standards Institution Report No. 600, entitled “ The Application of Statistic*/ 
Methods to Industrial Standardization and Quality Control.” * 



31 


936] Rejecting too Variable a Product . 


>tatistical measure to calculate for this purpose is the sample standard 
leviation 


-7 


(x — x ) 2 


q 

x being the mean and n the number of observations in the sample. 

In repeated samples s can take all values from zero to infinity, 
the equation to the distribution curve being 


ns 2 

jp(s)ds = const X $ n “ 2 e 2a *ds 


For small values of n this curve is skew, but as n increases it approxi¬ 
mates more closely to normality. In Fig. 1 the distribution has 



Fxa. 1—Distribution of Standard Deviation in Repeated Samples of 10 
observations, from a Normal Population (or of Population — 0*8, 1*0, 
1*5, 2-0). 


been drawn out for the case n = 10, a having successively the 
values o*8, i-o, 1*5 and 2*0. The constant is in each case so adjusted 
that the area under the curve is unity, and therefore the chance of 
obtaining a value of between two values and is given by the area 
nder the curve between these points. We shall denote this by 
t Vr < s < s 2 ). Such chances are evaluated by noting that ns 2 /a 2 
f|fcws the x 2 distribution 
S.7 3?(x%C 2 = const. X (x 2 )/ /2 - 1 e ~ * 2/2 d % 2 

the number of degrees of freedom/ — n —- 1.* 


* Tables that can be used in this connection are : (i) Tables of the Incom- 
’ete Gamma-Function, edited by Karl Pearson; publishers, H.M. Stationery 
hce; and (ii) ^ Table, Statistical Methods for Research Workers, R. A. 
^ber; publishers, Oliver and Boyd. 




32 


Welch —The Specification of Rules for 


[No. 1, 


3. Use of Standard Deviation as a Criterion . 

Suppose that we have a sample of io observations and that we 
calculate s from it. We wish to fix upon some value such that if 
s is greater than this value, then we shall regard the product as 
unsatisfactory with respect to variability. We shall refer through¬ 
out to this sample rejection level as s 0 . Consider for example, 
a case where we could certainly term the product satisfactory 
if cr were 0*8 or less. Then our first consideration is this. We 
do not wish to run too great a risk of rejecting satisfactory 
material. Now, if a = 0*8, the value of s 0 such that p(s > s 0 ) = 
0*05 is 1 * 04 . Thus, if we reject when s > 1 * 04 , we run a i in 20 
risk of rejecting material for which a is 0*8, since we may obtain 
such large values of s simply by chance once in 20 times. The 
risk is represented by the shaded area to the right of 1*04 in Fig. 1. 
By increasing s 0 we shall decrease p(s > s 0 ), and thus, by going far 
enough, can reduce our risk to as small proportions as we like. 
Another point, however, has to be taken into account. If $ < s 0 , 
we shall accept our material as satisfactory, and we must consider 
the risk we shall run of doing this when a is really rather large. 
The chance of acceptance is the area lying to the left of 5 = 1*04 
and in Fig. 1 we see how this changes as c increases. When a = 0 * 8 , 
the chance of acceptance is 0 * 95 ; when <? = 1*0 it is 0*71 and when 
a = 1*5 it has decreased to 0 * 15 . Now, it may happen that, although 
we should like a to be less than 0*8, we should not definitely term 
the product unsatisfactory until a were as much as 1 * 5 . We then 
see that with the value s 0 = 1*04 we may be running a risk of accept¬ 
ing unsatisfactory material in 15 per cent, of cases. This will usually 
be too often, and the only way of reducing this risk with the present 
sample size would be to fix s 0 lower than 1 * 04 . But clearly if we do 
this we shall increase the first kind of risk alluded to—namely, that 
of rejecting satisfactory material ( i.e . material for which a — 0*8 or 
less). There are thus two contradictory considerations in fixing s 0 , 
and it will be seen that it is impossible in the present instance with a 
sample of 10 only—to reduce to small proportions the risks of both 
of the kinds of error referred to—namely, (a) the error of rejecting 
good material, and (6) that of accepting bad material * To improve 
the situation, the number of observations or tests must be increased^ 

* For a further discussion of these two kinds of error see a paper by E. S. 
Pearson and Joan Haines, “ The Use of Range in Place of Standard Deviation 
in Small Samples,” Supplement to the Journal of the Royal Statistical Society, 
VoL II, No. 1, 1935. Also in connection with testing mean values a paper 
by B. P. Dudding and Miss I. M. Baker, 44 The Application of Statistical 
Methods to the Planning of Routine Testing Procedure,” Journal of the Society 
of Glass Technology, 1934, VoL 18. 



1936] 


RejectingJoo Variable a Product. 


33 


4 . Influence of Sample Size. 

The implications for different sample sizes of the rule—reject 
when s > s Q —are illustrated in Pig. 2 . In this diagram the chance 



Tig. 2.—Probability of s > s 0 for Different Population a (sample sizes 
n = 5, 10, 25 and 101; a on logarithmic scale). 


of rejection is plotted against the a of the population sampled. 
Since p(s > ks Q ) when a = A o' is the same as p(s > s 0 ) when a == a', it 
is seen to be sufficient to express o as a multiple of s Q . The same 
diagram can thus be used, whatever value s Q has. For convenience 
<y/s Q has been plotted on a logarithmic scale. The lines ruled across 
at p = 0*05 and p = 0*95 cut each curve in two points, giving two 
corresponding values of <j. The first of these we shall call <r„ and is 
such that for a < c e the chance of rejection is never more than i in 
20. The second we shall call a „ 9 and for a > the chance of 
acceptance is never more than i in 20. Between a,, and a u we have 
what we shall call, in relation to our rule of rejection, a region of 
doubt. Here we cannot be reasonably certain that the rule will do 
SUPP. VOL. III. no. 1. c 







31 Welch — The Specification of Rules for [No. 1, 

either one thing or the other, i.e. will either reject infrequently or 
accept infrequently. 

If we take n = 5 , we see that <?,> = 0 * 73 s o and = 2 - 65 s 0 ; 
on increasing to n = 10 , g c = 0 - 77 s 0 and a u = l* 73 s 0 . As n increases 
further, the difference between a e and a u continues to decrease, and 
in the limit when n becomes very large both a e and g u will tend to s 0 . 
In Table I factors h 1 and b z are given for a number of values of n , and 

Table I. 

Factors b x and b 2 which give = 6 x s 0 and a n = b 2 s Q for 5 per cent. 
a)id 1 per cent, risks. 


Size of 
Sample 
n. 

6* for chance of rejection. 

& 2 for chance of acceptance. 

ft 01. 

0 - 05 . 

0 - 03 . 

0-01. 

5 

0-614 

0-726 

2-652 

4-103 

6 

0-631 

0-736 

2-289 

3-291 

7 

0*645 

0-746 

2-069 

2-833 

8 

0-658 

0-754 

1-921 

2-541 

9 

0-669 

0-762 

1-815 

2*338 

10 

0-679 

0-769 

1-734 

2*188 

15 

0-717 

0-796 

1-511 

1-794 

20 

0-743 

0-815 

1-406 

1-619 

25 

0-763 

0-829 

1-344 

1-518 

30 

0-778 

0-840 

1-302 

1-451 

40 * 

0-806 

0-858 

1-254 

1-387 

50 * 

0-821 

0-870 

1*219 

1-329 

70* 

0-843 

0-886 

1-176 

1*262 

100* 

0*864 

0-902 

1-141 

1-208 


* These values based on normal approximation to s distribution. 


from these a t and c v can be derived by means of the equations 
a, = b x s Q and a u = b 2 s 0 . Throughout, so far, we have taken 0-05 as 
representing a small risk. We may, of course, take i in ioo or 
whatever we like for this risk. Much will depend on the particular 
problem, and in any case it is usually difficult to attach a precise 
meaning to definite probabilities. Often it will not be appropriate 
to have the risks of both kinds of errors the same, but this will not 
influence our procedure. In Table I factors are also given corre¬ 
sponding to chances of rejection 0*01 and 0 - 99 . Except for the last 
four values of /?, all the figures of the table are derived simply from 
R. A. Fishers y 2 table, and are the same as factors given in a different 
connection by E. S. Pearson (B.S.I. Report, p. 69 ). The remainder 
were obtained by the approximate method of taking the distribution 
of the standard deviation to be normal when n = 40 and over. The 
use of the table may be illustrated on the following hypothetical 
problem. 



1936] 


Rejecting too Variable a Product . 


35 


5. A Specification Problem . 

A standardizing authority is faced with the problem of suggesting 
a sampling procedure, to be applied to a consignment of goods, which 
will satisfy two conditions, (a) the consignment from a manufacturer 
who is controlling the variability of his product at a < 2*5 must 
rarely be held up, and (b) a consumer must run only a small risk of 
accepting a consignment of a product for which c > 5. How 
large a sample must be taken and what value of .«? 0 ? Let us take 
the permissible risks of both kinds of errors to be i in 20 . Then the 
first consideration demands that n must be such that a, — 6 ^ is 
greater than 2*5. Similarly a,< — b^ must be less than 5. This 
means that — must be greater than 0*5. We have 
noted that bjb 2 increases with n , and thus we must choose the 
first value of n which makes this ratio greater than 0*5. For n = 10 
the ratio is 0*769/1-734 = 0*44, which is too small. For n = 15 
it is 0*527, and thus a sample size of 15 will suffice. (Actually 13 
would have been sufficient, but it will be usual to specify a sample 
size to the nearest 5.) We may now choose s 0 such that c ( = 0*79Gs 0 
= 2*5, i.e. s 0 — 3*14. If we wanted the two risks to be 1 in 100 , we 
should naturally expect to require a larger sample. b ± and b 2 would 
then have to be taken from the 0*01 columns of Table I. For 
11 = 25, bjbz = 0*763/1*518 which is just greater than 0*5. Thus 
25 would be the required sample size and s 0 would be 2*5/*763 = 3*28. 

In this problem it was necessary for a ( /a H to be greater than 0*5. 
If cmust be much nearer to unity, it is seen that n must be 
increased considerably. For example, if we did not wish to reject 
when or < 4*5 or accept when a > 5*0, we should need b 1 /b 2 > 0*9. 
Now, taking 1 in 20 risks, even when n = 100 , hjb^, is 0*791. Thus 
even with a sample size of 100, we could not set up our test to 
satisfy tho required conditions. The amount of testing which would 
be sufficient is so extensive that the cost might very likely be pro¬ 
hibitive, especially if the tests were destructive. This has lead to 
the suggestion that for certain types of product conformity to specifi¬ 
cation should be secured, not solely by sampling the consignment 
delivered, but by requiring that the manufacturer provide to the 
standardizing authority evidence from suitable routine records of 
the level of control maintained.* 

6 . An Electric Lamp Problem . 

We have proceeded so far on the assumption that the practical 
steps taken in sampling a product can be sufficiently closely repre¬ 
sented by the concept of drawing random samples from a homo- 

* This question is discussed fully by E. S. Pearson, B.S.I. Repoit, Section V. 



36 


Welch— The Specification of Rules for 


[No. 1, 


geneous normal population. There is, however, one important 
phenomenon which occurs again and again in industry which may 
force us to make our representation more complicated. This is 
that the manufactured units can often be grouped so that those in a 
group have a certain common element in their history not possessed 
by those not in the group, this causing them to resemble one another 
more closely. Thus articles made at the same time or on the same 
machine or from the same raw materials are liable to be more alike 
than articles drawn at random from the whole output. This has 
been termed a “ batch ” effect, and certain consequences follow from 
this effect which may be of importance to producer, consumer and 
standardizing authorities. These consequences had to be taken into 
account in the electric-lamp problem referred to at the beginning 
of the paper, and will now be discussed in relation to this problem. 
The characteristic which we shall be considering throughout is the 
initial efficiency of the lamp. (Initial efficiency is defined as the ratio, 
after a short ageing period, of the luminous output of the lamp to 
the input of power, and is expressed as lumens per watt.) 

The variability of lamps may be considered from two aspects. 
The manufacturer, on the one hand, will base his ideas on the results 
of routine tests of lamps taken from the factory at intervals of time. 
On the other hand, a consumer or standardizing authority may wish 
to form an opinion of the variability of the product as it appears on 
the market, and to do this without relying on information in the 
possession of the manufacturer. For this purpose the authority 
must take a sample, and must calculate from it some measure or 
measures of variation. The British Standard Specification, No. 161,* 
lays down conditions for drawing such samples; it suggests the use 
of the coefficient of variation of the sample, and gives for different 
types of lamps limits which this coefficient should not exceed. 

In clause n ( b ) of the specification it is stated that under certain 
circumstances a sample may be taken of “ not less than 25 lamps 
purchased under the supervision of the testing authority in approxi¬ 
mately equal proportions from not less than 5 different traders.” 
In clause 13 ( b ) it is specified that the coefficient of variation of such 
a sample should not exceed a certain definite value. In the first 
place, we shall view the specification from the standpoint of the pro¬ 
ducer. He will want to know what proportion of samples from his 
product will fail to satisfy the above clause and, if this proportion 
is too large, at what level he must control his product to make it 
sufficiently small. In order to answer this question it will be 

♦ British. Standard Specification for Tungsten Filament General Service 
Electric Lamps (May 1934): Publications Department, British Standards 
Institution. (This specification will be referred to throughout as the B.S.S.) 



1936 ] Rpjech'yig too Variable a Product. 37 

necessary to distinguish different sources which influence the final 
variability. 

7 . Relevant Details of Manufacture. 

The efficiency of a lamp is mainly determined by the quality of 
its filament. From one tungsten ingot 5,000 — 10,000 metres of 
filament wire may be drawn, and of this a length of about 1,500 
metres is taken at a time, furnishing a batch of about 1,500 filaments. 
Such filaments may be regarded as one unit, in that they undergo 
together certain heat and chemical treatments which follow the draw¬ 
ing of the wire. Lamps made from one set of filaments will be referred 
to simply as a batch of lamps. Such lamps are found to be more 
uniform than ones taken at random from the total product. We 
shall thus find it convenient to divide lamps into groups according 
to the origin of their filaments, distinguishing two sources of 
variability; first, the internal variation among lamps of the same 
batch, and second, the external variation among the means of 
batches. 

An observation of initial efficiency x on any one lamp may be 
represented as the sum of three parts, viz.: 

x^cc + y + z, 

where a is the population mean of the total product of lamps of the 
type under consideration, y is the difference between that mean 
and the mean of the particular batch to which the lamp belongs, 
and z is the deviation of the individual from the mean of the batch. 
Observations taken in the past indicate that, for practical purposes, 
y and z can both be taken as normally distributed about zero. The 
standard deviation, a ls of y measures what we have termed external 
variation, and the standard deviation, cr 2 , of z measures the internal 
variation. The parameters oc, er x , and a 2 are pointers indicating the 
level of control of the quality of the product, and they may be 
estimated by using an appropriate sampling procedure. If, in the 
course of production, we take a sample from the factory by choosing 
k batches at random and taking n individuals from each batch, we 
have the algebraic identity 

SSfe - £)2 = SS(ar ft - x t f + uS(* f - . . ( 1 ) 

t % t 1 t 

where x h (t = 1 , 2 ... Jc; i = 1 , 2 .../?) is a single observation, 
x t the mean of those observations from the t th batch and x the mean 
of the whole sample. Then, owing to the fact that — x t ) 2 

t % 

is distributed as x 2 (T 2 2 with l(n — 1 ) degrees of freedom and 
nZ{x t — x) 2 is distributed independently as x 2 ( w<y x 2 + <*%) 



38 


Welch— The Specification of Rules for 


[No. 1, 


(k — 1 ) degrees of freedom, we obtain the following equations for 
estimating and * (the arrow denotes, “ is an estimate of ”) 
SS(x /f -^) 8 

—j~ f - ~77 -> <*? 

lc(n — 1) 
nZ(x t — .r ) 2 

Our best estimate of a is T. The accuracy of these estimates increases 
with n and £, and a producer with his product under good control 
will usually have enough test results to approximate very closely 
to the true values a, <j 15 and o 2 * Samples taken on the market in 
accordance with B.S.S. are, however, comparatively small, and con¬ 
siderable uncertainty will exist in assessing the level of control from 
them alone. What this means to the producer will now be 
considered. 

8. The Problem. 

For simplicity let us suppose that some testing authority has 
taken in accordance with clause 11 (b) a sample of 25 lamps, con¬ 
sisting of 5 lamps from each of 5 different dealers. Further, that the 
5 lamps so taken from any one dealer come from the same batch. 
(This will almost certainly be the case if the 5 lamps belong to the 
same box of 50 or 100 , into which lamps are packed.) Then if 
t( = 1 , 2 ... 5 ) denote the batch to which a lamp belongs and 
i( = 1, 2 . . . 5) the 5 individuals from a batch which belong to the 
sample, the efficiencies may be written 

= a + y t + z n 

The coefficient of variation v is 

V'SS(x !l - Jcf/25 

v= .- L‘. = - =1 

X x 

and if v is greater than some value % specified by B.S.S., the sample 
will be regarded as representing material which is unsatisfactory in 
respect of variability. Knowing a, and ct 2 for his product, the 
manufacturer wishes to know in what proportion of cases samples, 
taken as above, will fail to reach the requisite standard, and how this 
proportion will be affected by changing a, a 1 and o 2 . Then if, for 
instance, a reduction in the external variability c x seems to be called 

* The above method of analysing the variability of the sample is only a 
particular case of a type of procedure now in common use and referred to 
generally as the Analysis of Variance. As certain readers may not be familiar 
with this mode of analysis, a brief statement of the theorems involved in the 
present instance is given in the Appendix. 



1936] 


Rejecting too Variable a Product. 


39 


for, lie can consider whether it would be practical economically to 
modify those processes of manufacture which affect c t . Smaller 
values of a 1 would result if, for example, tungsten ingots varied 
less or if the various heat and’ chemical treatments could be more 
rigidly controlled. It is the practicability of doing something like 
this that would have to be considered. Similar considerations 
'apply if cr 2 calls for reduction, and it is probable that some of the 
factors to receive attention here would be ones which had negligible 
effect on a v 

Since v = sjx = (s/oc)(a/£) and the coefficients of variation 
aja. and a 2 /a are small, the distribution of v differs little, for practical 
purposes, from that of s/a. The proportion of samples giving 
v > v 0 can be taken as equivalent to the proportion giving (s/a) > v 0 . 
First we shall consider the distribution of w = 25s 2 = — x) 3 . 

t L 


9. Distribution of w. 

Only results are given here, all mathematical derivation being 
left to an Appendix. From ( 1 ) we see that w is the sum of two parts, 
the first of which is distributed as x 2 cr a 2 with 20 degrees of freedom 
and the other independently as x 2 (Scx 1 2 -f- a 2 2 ) with 4 degrees of free¬ 
dom. From this we deduce the distribution of iv to be 


'P^ dw ~ aio&a( 1 / a _ 1 /&)io [4 f(*/VlO, 9) 

_ (i/a-l/6} Jr ^ vll,10) _ 

where a = o 2 2 ; b = Sctj 2 + a 2 2 ; z = -{l/a — 1/&) 

«v7TT / 

and I(u } j)) = / e~ v vP(h / e~ r vPdv 


dw 


The functions I(u,^) can be evaluated by means of the Incomplete 
Gamma-function Tables,* and thus the ordinates of the distribution 
can be obtained. The chance that s/a > is the same as the chance 
that w > w 0 , where w > 0 = 25a 2 u 0 2 . This chance, P, is given by 

r « 

J p{w)div t and can be reduced to a form involving two incomplete 
gamma-functions viz. 


f-i-j(r/vn,io )+ r | ^ i/a VVu,io )[ I -_. + (1/ „ _ n/6) ] 

r 10 se _r 

+ TFT ’ * 

r = w 0 /2a and s = w 0 /2b 
* Loc. ciu 


where 


( 2 ) 



40 


Welch —The Specification of Rules for 


[No. 1, 


Provided u\ a and b are expressed in the same units, ( 2 ) shows, as 
of course must be the case, that it does not matter what this unit 
is. It is usual to express g 1 and a 2 as percentages of the population 
mean a. This amounts to multiplying a and b by ( 100 /a) 2 . For 
the purpose of substitution into ( 2 ) w Q must then be expressed on 
the same basis, i.e. will be 25u 0 2 , where v 0 is the percentage coefficient 
of variation. 

10. Applications of the Probability Integral ( 2 ). 

Clause 13 (b) of the B.S.S. allows acceptance of a sample if v does 
not exceed some definite v Q . For high voltage 6 o-, 75 - and 100 -watt 
lamps this acceptance limit v 0 is 4 per cent., and, for simplicity, the 
following examples will relate only to these types of lamps. 

(i) Let us suppose that the external variation a 1 is controlled 
at 3 per cent., and ask ourselves what is the effect of making changes 
in the factory, which will alter the internal variation cr 2 . TF 0 is 25 X 4 2 
= 400, is 3, and we shall let g 2 take values from 0 to 2|, the 
units being percentages of a. The probability of rej ection is obtained 
by substitution in ( 2 ) and is given in column 2 of Table II. The 


Table II. 

100 X probability that s > 4 per cent, (a! =s 3 per cent.). 


K 100. 

a 

Percentage of rejections tv hen sample consists 

(a) of 5 groups 
of 0 . 

(It) do , using approx, 
of f quation (9). 

(t) of 23 all from 
different batches. 

0 

6*30 

6*4 

0*68 

I 

6*83 

6*8 

0*94 

1 

8 26 

8*3 

2*14 

11 

11*18 

11*4 

6*07 

2 

16*59 

17*4 

16*08 

21 

26*05 

25*5 

34*21 


values are plotted in Fig. 3 (continuous line). It will be realized 
from this curve that if the external variation is controlled at 3 per 
cent., there is little to be gained by trying to reduce the internal 
variability below 1 per cent. 

(ii) Let us now suppose that the internal variability is controlled 
at 1 per cent., and consider what would be the effect of any practical 
procedure which alters a v We have u ' 0 == 400, a 2 = 1 and shall 
let o x take values from 2 to 4. The probability of rejection as 
found from equation ( 2 ) is given in column 2 of Table III, and the 
results are plotted in Fig. 4 (continuous line). The chance of 
rejection is negligible as long as a 1 is less than 2 per cent., and does 





PERCENTAGE PROBABILITY OF REJECTION 







42 


Welch — The Specification of Rules for [No. 1 
Table III. 

100 X probability that s > 4 per cent. (c 2 = 1 per cent.). 


X loo. 
a 

Percentage of rejections when sample consists 

(ft) of 5 groups 
of J>. 

(6) Qo., using approx, 
of equation (9). 

(r) of 2o all from 
different batches. 

2 ! 

0-12 

0-1 

0-00 


1-91 

1-7 

0-03 

3 1 

8*26 

8-3 

2-14 


19-16 

19-5 

17-89 

4 

32-06 

32-5 

48-88 


not become really important until a x is 3 per cent. As a 1 increases 
beyond this level the proportion of rejections grows rapidly. If a 
manufacturer has his product controlled with cq = 3 per cent, and 
a 2 = 1 per cent., he has no need to fear that samples taken from it 
will frequently fail to pass the specification. 

A more useful approach, and one of wider application, would be 
to find what pairs of values g x and ct 2 are such that they will make the 
proportion of rejections less than a certain small amount—say 0-05. 
The solution to this problem involves finding g x and s 2 such that 
P of equation (2) is 0*05. This is rather difficult, but a simplification 
can be made if we are content to use the approximation to P which 
is given in equation (9) of the Appendix. This approximation gives 
quite good results for the cases for which it has been tested out, and 
as the role of probability theory in the present connection is simply 
to give a rough idea of the implications of a specification, the small 
errors involved in mathematical approximations are relatively 
unimportant. 

It will not be necessary in the following to restrict ourselves to 
any particular value of the test rejection level a 0 . As when dealing 
with the case of sampling from a homogeneous normal population, 
we can obtain general results which are applicable whatever s 0 . 
This is so because the probability of rejection involves g v g 2 and s 0 
only as ratios cq/sg and a 2 js Q . The pairs of values g x /s 0 and g 2 /s 0 for 
which p(s > s 0 ) = 0*05 were found by using equation (9) and are 
plotted in Fig. (5c). They form the curved boundary of the region 
A , which is therefore such that, for any a x and <r 2 within it, the risk 
of rejection will be less than x in 20 . Knowing his <q and c 2 , the 
producer can immediately see whether they fall within this “ safe ” 
area. 

The other viewpoint of the specification is that of the consumer 
who wishes to know what sort of material he is in danger of accepting. 
Using the same approximation as before, pairs of values g x /s 0 and 



1936] 


Rejecting too Variable a Product. 


43 


cj 2 /$o were found such that p(s < s Q ) — 0*05. These form the boun¬ 
dary of the region B in Fig. ( 3 c), and the consumer will accept less 
than once in 20 times samples from a product whose true and a 2 
belong to this region. 

11 . Comparison of Different Methods of Sampling. 

The above example relates only to a sample of 25 lamps, where 
these belong 5 to each of 5 different batches. It is of interest to see 
what happens when the sample is taken in other ways. Since the 
specification stipulates only that the lamps should not be taken from 



Fig. 5.—Implications of Rule—reject when a > .s fl . For Populations in A, 
Chance of Rejection-is less than 1 in 20; for B Chance of Acceptance is 
less than 1 in 20; for 0 neither the Chance of Acceptance or of Rejection 
is less than 1 in 20. 

less than 5 traders, it would be permissible to go to the extreme of 
taking 25 lamps, every one from a different trader. In such a case 
the distribution of the standard deviation, s , of the sample is much 
simpler than it was before. If we write a single observation of 
initial efficiency as 

+ Vi + % (* = 1,2... 25) 

then each of the sr/s is independently distributed. (In the previous 
case those belonging to the same group of 5 were not independent, 
because the y contribution was the same for each lamp in a group.) 






14 Welch — The Specification of Rules for [No. 1, 

Furthermore, a r = Va x 2 + a/ for each observation. Thus s = 
— ^) 2 /25 will now be distributed as the S.D. of a sample of 
25 from a population in which a = Vafi + tf 3 2 . The proportions of 
rejection for the values s Q , and s 2 of the previous example are 
given in Tables II and III and plotted in Figs. (3 and 4), and may 
be compared with those obtained for the other method of sampling. 

This comparison can also be effected by finding regions A and B 
having the same properties as before. This is exceedingly simple, 
for we can use Table I of the first part of the paper. From this we 
get for n = 25, c f /s 0 and a u /s 0 bounding what we termed a region of 
doubt. We have only to replace a/s 0 by Voq 2 + a 2 2 /.<? 0 and we obtain 
the. circles of Fig. (5 h). The region C is much smaller than it was 
before, and it is clearly advantageous to take each individual from a 
different batch. On the other hand, however, practical considera¬ 
tions decide against this more efficient procedure because of the 
expense involved in having to go to 25 different sources to compile a 
sample. It has been found sufficiently troublesome to sample 
from 5 dealers, particularly with the types of lamps that are not 
stocked by all retailers. It will be dangerous, however, to sample 
fewer sources; for, naturally, the fewer the number of batches 
represented the more uncertain will be the information provided 
about the external variation. Since the object of the electric- 
lamp specification under consideration is the assessment of the gross 
variability of the total output of a particular type of lamp, viz. 
Vcrj 2 + we cannot allow a sampling procedure which almost 

gnores cr x . 

The extreme case would be a sample of 25 all from the same batch. 
Such a sample could provide no information whatever about the 
external variation a v The efficiencies of the lamps would now be 

x x = a + y + z t (i = 1, 2 . . . 25) 

where y is the same for all lamps in the sample, z is the only factor 
which changes and thus s — — #) 2 /25 will now be dis- 

4 

tributed as the standard deviation of a sample of 2 5 from a population 
in which o = or 2 • and a H are obtained as before, and in Fig. (5«) 
the corresponding regions A, B and C are shown. Clearly the 
results of applying a rule of rejection are quite independent of the 
external variation. This sampling procedure would be useful only 
when information is wanted about the internal variation alone. 

12. Conclusion. 

The consequences have been considered of basing upon the 
standard deviation of a small sample a rule for determining whether 



1936] 


Rejecting too Variable a Product. 


45 


material is satisfactory or unsatisfactory with respect to variability. 
In tbe first place, the population sampled has been assumed homo¬ 
geneous and normal, and the method of approach has been to see 
what values of the true variability, a, of the population will be such 
that, (i) samples will usually be rejected on applying the rule, 
(ii) samples will usually be accepted and, (iii) it cannot be said 
either that samples will almost certainly be accepted or that they 
will almost certainly be rejected. The influence of what is termed a 
batch effect has then been examined, and this has been discussed in 
relation to a clause in the British Standards Specification No. 161, 
which concerns the efficiency of electric lamps. There are now two 
variabilities, a l9 the variability from batch to batch, and <j 2 , the 
variability within batches. For a sample size of 25 , pairs of values 
and g 2 have been divided as before into three groups, according to 
the results which follow on applying the rule—reject when s is 
greater than some s 0 . This has been done for different methods of 
sampling, and is illustrated in Fig. (5). The case of a sample con¬ 
stituted of 5 lamps from each of 5 batches is considered rather fully, 
and a concrete application of the theory is given to a problem which 
might arise in producing lamps. 

It will be noted that for small samples there is rather a large differ¬ 
ence between populations which lead to very few rejections and those 
which lead to very few acceptances. If a specification is drawn up 
to be fair to a producer, then, judging from samples alone, the 
consumer may be in danger of accepting material quite frequently 
when it is rather inferior. This is not the whole position, however, 
for the consumer has this extra assurance: the producer wants as 
few rejections as possible, and so will try to control his product 
accordingly. It would be desirable if evidence of this endeavour to 
secure a satisfactory level of control could be made available to the 
consumer or testing authorities. This would mean adopting some 
plan by which a standardizing authority could examine suitable 
records of tests carried out as a routine procedure in the factory; 
clearly with such information the assessment of quality would be 
very much simplified. 


Appendix. 

(i) Derivation of Results Quoted in Section ( 7 ). 

x tl are a set of N = kn observations consisting of k groups 
(t= 1,2 .. . k) of n individuals in each group (£= 1 , 2 . . . w)such 
that 

= a + Vt + Sft 

where y and z are normally and independently distributed about 



46 Welch —The Specification of Buies for [No. 1, 


zero with S.D/s and a 2 respectively. With the usual notation 
we have the identity 

SS(i„ - x? = SS(/ ft - x,) 2 + hS^ - x) 2 

£ £ t i t 



Now, the sum of any number of quantities which are normally 
and independently distributed about zero is distributed in the 
same fashion, with standard deviation equal to the square root 
of the sum of the squares of the standard deviations of the compo¬ 
nents. Hence x t = y t + z t is distributed normally about zero with 
standard deviation equal to Vcq 2 + G%/n. Also if w l5 11 % . . . u m are 
any m quantities normally and independently distributed about the 

m 

same mean with standard deviation a, then S(n — uf is distributed 

1 

as x 2 a 2 with (nt — 1) degrees of freedom, where the general yf 
distribution with/degrees of freedom is defined by 


Furthermore, the sum of a number of independent is dis¬ 
tributed as x 2 with degrees of freedom equal to the sum of those of 
the components. We thus have — i 1 ,) 2 distributed as 

t i 

X 2 <t 2 2 with / = h(n — 1) and — x) 2 as yf{nv 1 2 + <y 2 2 ) with 

t 

f = lc — 1 . Also these two distributions are independent, since any 
deviation (z n — z t ) is statistically independent both of y t and 

Since the mean of the general x 2 distribution is equal to the 
number of degrees of freedom,/, the mean value of — x t ) 2 in 

/ 1 

repeated samples is k(n — l)cr 2 2 and of n^(x t — x) 2 is (k— 1 )(ua 1 2 + 

i 

a 2 2 ). From this it follows that the equations given in section (7) lead 
to unbiassed estimates of oq 2 and cf. 


(ii) Derivation of Equations of Section (9). 

We have w = au + hv , where u is distributed as 

f { H ) = 2ft^T(fjT) C * * * (/ = 20 ) . . ( 4 ) 

and v is distributed independently in the same maiiner with / = 4. 
Considering the more general case where for w, / = /x and for iq 
/=/>, we have: 



1936] 


Rejecting too Variable a Pioduct. 


47 


wia 

, . ft n ( w — du 

p(w) = j jK«) x jy = —5— J x J 


reducing by ( 4 ) to 


p{w) = c 0 e~ w J 2b w 


4+i - X (■!{'- 1 „ - 1 - V a/« - 1®, 


[ /-X -L l „l •' 3_ 1 

I (1-0 3 1 


4±a 

where c 0 -i = amM 2 2 TtfJiWJi) 

The probability P' that w < w 0 is given by integrating p(w)dw 
from 0 to w 0 . This double integral may be transformed to new 
variables by the equations 

w(l — t) = v ; = it 

and then takes the form 


«o w 0 -u ^ ^ 

P'= c n f f u~ 2 ~ 1 e~ l d 2a v 2 1 e~ l l 2b dudv . . ( 5 ) 

u —0 v =0 

In the present case/ 2 is 4 , i.e. — 1 ^ = 1 and/ x = 20. Also 

J 452 |^i _ + l j . . (6) 

0=0 * 

Substituting (6) into (5) and making use of the reduction formula 
for incomplete gamma-functions, we arc led finally to equation (2), 
P being 1 — P\ 

The reduction of ( 5 ) to the expression (2), which involves only 
two incomplete gamma-functions, is possible because in our case 
/ 2 = 4 . As long as/ 2 is an even number we shall be able to express 
( 5 ) in terms of a finite number of such functions, but if f z is large the 
expression will be complicated. In such a case, and also when/ 2 is 
odd, it would be well to have some method of approximating to P\ 
Since the distribution of w ranges from 0 to 00, a Pearsonian Curve of 
Type III may give a satisfactory approximation. Let us consider 
the Type III curve having the same mean and second moment as the 
^-distribution. For u we have : 

mean «=/ x ; y . 2 = 2/ x 
and similarly for v. Hence for w 

mean w = af x + lf z ; g 2 == 2 (a 2 f x + b%) . . ( 7 ) 



48 "Welch —Rules for Rejecting too Variable a Product. [No. 1 , 


( 8 ) 


The moments of the most general Type III curve : 

= (2</)^r(//2) M ’ 2 e ~“ /2ff • ' 

are mean = gf; (x 2 = 2 g 2 f. 

Equating these to ( 7 ) we have : 

f^ W±±Ml 

9 «/i + 6/ a ’ 1 aj i + &% 

Using these values in (8) we have as an approximation to the 
probability P that > w 0 : 


P _ 1 _ 2 (av« / a 

\ 2 f,V/’ 2 ) 


i.e. 


Wn 


_ («/i + w 

V'2(a 2 / 1 + 6%)’ 2(a 2 / 1 + 6 2 / 2 ) 


(v2(a ! 


l). . ( 9 ) 


The results of calculating P from equation ( 9 ), for the values of 
a 1 and c 2 considered in section 10 are given in columns 3 of Tables I 
and II. The agreement with the results from the exact formula is 
good. In certain other cases, however, the agreement was not so 
satisfactory. 



1936 ] 


49 


The Statistical Analysis op Field Counts of Diseased 
Plants. 

By W. G. Cochkan, B.A. 

Introduction . 

In the study of the propagation of plant diseases, a common method 
of obtaining data is to examine every plant in a field or greenhouse for 
symptoms of a particular disease at certain intervals. Thus after 
each examination a field map can be prepared showing for each 
diseased plant in the field the earliest count at which the disease was 
noticed. The object of this paper is to discuss the statistical analysis 
of such data. The interest to the plant pathologist of those aspects 
of the distribution of diseased plants which are discussed below 
will vary with the particular disease and the state of knowledge of 
the mechanism by which it is propagated, but it is hoped that the 
questions considered will be of fairly general interest. In the discus¬ 
sion two questions will be considered : ( 1 ) the distribution of the 
plants which have become diseased in a given interval, ( 2 ) the relation 
of that distribution to the distribution of plants previously diseased. 

The discussion arose out of an examination by Bald of the spread 
of spotted wilt of tomatoes, a virus disease which is carried by a 
species of thrips, his data being obtained from field trials made at 
the Waite Institute, Australia. The map from which numerical 
examples will be taken was of a field of 16 plots, each of 6 rows 
containing 15 plants each, so that there were 1,440 plants in the 
field. The tomatoes were planted out in November 26, 1929, and 
counts were made on December 18 and 31 and January 7,15, and 22 . 
There were four treatments, arranged in a 4 X 4 Latin square, but 
as these produced no appreciable effect, and as the figures are being 
used merely to illustrate the methods of applying the tests, the 
differences in treatment will be ignored. It may be stated that, 
owing to previous knowledge of the plant pathologist about the 
mechanisms by which the plants may become diseased, this case is a 
relatively simple one, and in it certain of the tests of significance 
discussed below are not really required. 

The field map at the end of the second count is shown below, a X 
representing a plant which was diseased at the first count (December 
18) and a -j- one which was diseased at the second count (December 
31). Plants which were found to be diseased at later counts are 
marked by a • , but have not been used as examples. 



50 


Cochran— The Statistical Analysis of 


[No. 1, 


The Areal Distribution of Diseased Plants. 

We consider at present the results of the first field count, showing 
the position of each diseased plant in the field. If every plant in the 
field has an equal and independent chance of becoming infected, 
the resulting distribution of infected plants over the area will be 
called a random one. The actual distribution may, however, 
deviate from a random one, owing to groups of diseased plants coming 
together more often than would occur by chance. The groups them- 


Field Map of Diseased Plants. 


■M4X** /**X*4> 

* X *X44 *X* 

•X4 • / • + 

•+ \4X X •« 

#4**XA44X*4 • • 

X •• 4 • 

• •x *4 #x 4 

x +• 44 x 

+ 4 44X* *••+** 

t4+4X *44• ♦ • 

444 ###X • 

4X 44 x ##4 •> 

X **X *'4*XXX444* • 

4 *+ • ♦ 

•X## 4 • • 

X x4f#44X# * 

• 4 4* 4 X** 

• f •« X 

• •4* * • 

XX4 4 #X#XX 

*4 x • X • ***4 

4 44 44 

• • X4 X* *4 

• • • #xx4# 

• • • -f-X* *X4X 

X ••X444X4• X 4 

+••4 • 4 

4*X X • X# 

f• *xx • +• 

X* • X 

•4x X X 4 

•4X##44 X # V 

4 *xx x, *4 

• XXX 

• •X • • 4X4+ 

tf • •# XfX#XX 

• x • • *XA 

4+4*4 

XX 4X 44 X? 

x* • #x#4 4#« 

• 4* X 4 *+XX 

X* +• *4* 

• # 4 #XX 

••x • #+#x#x 

<• *4 4 

V X4 4X • 

X* 4X ##4* • 

• 44* 

X X 44 

XX • X 

+x *4xxx* 4 

X • 44 44*4X4* 

••X X + **44*4 

X*X X+4 + • 

4"#*XX X*4** 

x*4 X*X44 •+• x 

• x #4* + 

X+X+*XXX+* * 

4# XXX# 4X 

•4 x#4#4f* *x 

< 44 

•*X* XX x4 

•\X4#4X#44 X •+ 

• X 

x • •+* x*x** 

Xx*4**44 • 4 x 

X 4#4X4 XX44 

•4 /XX*## 4 4 

4 X«# 4X#X • x 

• 4X4 •* 4 • ••• 

4 XX44 X • • • 


XXX*x XX4 X* • 

* *44* • • 

X4x X #4v< 

/• 44*4X4 X#X 

►X ##x 4+ xX 

■H- 4#Xx**# x* 

• 4X X * 


• • X 4 X# # 

x#4 • • 4 4 4 

X /4/X4X/X4* 

/s> *4 /x# x Xx 

• X 4* •••4* 

• *X+ *X#4# 

4 #X *4X*X*44 

/ \44 x**#4* 

• X+ 4 #x # 

y x>4 #\ \ ♦ *• 

• x* • 4 ##4 

* #*x *• X 

• X • •• 4 

»,\*4 4 y X 

XX • X • •••• 



X = diseabed at first count, -f- — diseased at second count. 
• — diseased at a later count. 


selves may be scattered irregularly over the area, such as might 
happen if an insect carrying the disease had equal access to all 
plants in the field, but when feeding was able to infect several neigh¬ 
bouring plants at the same time. On the other hand, the deviation 
from randomness may be of a more regular type, infection being 
higher, for instance, near the borders than in the interior, or on one 
side, owing to a source of infective insects near by. 

An examination of whether the distribution of infected plants 
may be regarded as random, and of the type of departure which it 









1936] 


Field Counts of Diseased Plants. 


51 


shows, if any, may be made by dividing the area into small groups 
containing the same number of plants, say from 6 to 12 per group. 
If there are N groups and n plants per group, and every plant in the 
field has an equal chance p (== 1 — q) of becoming infected, the 
distribution of numbers of diseased plants per acre is the binomial 
one, the expected number of areas with r diseased plants being 

N n C r p r q n ~ r .( 1 ) 

By counting the number N r of groups observed with r diseased 
plants, we may compare the observed and expected series by the x 2 
test. Since, however, we expect the observed series to deviate from 
the expected one by having too many groups with many diseased 
plants, too many with few diseased plants and too few with numbers 
of diseased plants about the average, the estimated variance would 
appear to be a more sensitive quantity to use than x 2 , which is a 
general test for all types of deviation. A test of significance based on 
the estimated variance in a binomial distribution has been given by 

SrN 

Fisher, a) and may be obtained by noting that if ? = —where, of 

n 

course, N — SA r , then 
0 

n S N r {r - 7 ) 2 
- r) 


is distributed as x 2 with N — l degrees of freedom. 

It was pointed out to me by Dr. Fisher that a simple proof of this 
may be obtained by considering the data as arranged in a contin¬ 
gency table. The proof will be given for the more general case in 
which the numbers of plants in the small groups vary. I 11 the Ath 
group (A* = 1 , 2 ... N) let there be n k plants, r k of which arc 
diseased. Then our estimate of p is p = For each 

group the plants are divided into two classes, diseased (r k ) and 
healthy (n k — r A ), the expectations in the two classes being n k p and 
n k q respectively. Thus the value of x 2 obtained by the usual 
formula for testing the departure from independence in a contingency 
table is 

2 _ v f (n - Oh — n - »/ y) 2 } m 

X 1 - 1 1 tup >h<7 J 


y (n — tt,pf 

k -1 >hpq 


and has (N — 1 ) degrees of freedom. 



52 


Cochran —The Statistical Analysis of 


[No. 1, 


If % = n, for all this reduces to 


x 2 = S 


1 = 1 


to -?) 2 = n ^ (f - ft 
>ipq 


since 



( 4 ) 


If there are N r groups with r diseased plants, this may be written. 

x 2 = n 2 Nr %~% .(4a) 

In the proof quoted above of the x 2 distribution for a contin¬ 
gency table, it is assumed that the expectation in any class—in this 
case np or nq —is sufficiently large for the actual values obtained 
to be regarded as normally distributed about the expectation. In 
this work it may, however, be desirable to make tests where either 
np, nq, or both, may be small. In the example shown below, for 
instance, np is 1*63 and nq is 7*37 both small numbers. No examina¬ 
tion has yet appeared in print of the disturbance to the y 2 distribu¬ 
tion for the binomial series when the expectations are small, though 
the question merits consideration. It is in many cases possible to 
find the exact distribution of x 2 —which is discontinuous—without 
undue labour. The exact distribution of / 2 is given below for the 
case n = 9, p = 0*2, N = 10, for that region which is of importance 
in testing for significance. The expectations in the classes are in this 
case 1*8 and 7*2. The ordinary x 2 distribution for 9 degrees of 
freedom is also shown and gives an idea of the discrepancy from the 
true probability. 

X 2 distribution for the binomial series n 9, p = 0*2, N = 10. 


Probability of a value x 2 - 


Value of x*. 

True Probability. 

Probability for 
ordinary x l 
distribution. 

Discrepancy. 

12*222 

0*2397 

0*2006 

— 0*0391 

13*611 

0*1630 

0*1366 

-00264 

15*000 

0*1032 

0*0908 

-0*0124 

16*389 

0*0655 

0*0591 

-0*0064 

17*778 

0*0405 

0*0378 

-0 0027 

19*167 

0*0246 

0*0238 

-0*0008 

20*556 

0*0141 

0*0148 

4*0*0007 

21*944 

0*0089 ' 

0*0091 

4-0*0002 

23*333 

0*0048 

0*0055 

4-0*0005 


The agreement here is reasonably good, the highest percentage 
discrepancy being about 16 per cent. Since the values of x 2 are 



1936] 


Field Counts of Diseased Plants. 


53 


evenly spaced, a correction for continuity could be made, but would 
not improve tlie agreement near tbe significance points. In general, 
it may be expected that with bigber values of n or p tbe agreement 
will be better than that shown above, but with lower values it will 
be worse. 

As an example of tbe application of tbe yf test for tbe binomial 
series, tbe field was divided into areas, each containing 3 plants from 
each of 3 rows, making 160 areas of 9 plants each. Tbe distribution 
of numbers of diseased plants obtained at tbe first count is shown 
below. 


f. 

Observed x r . 

Expected. 

0 

36 

26-45 

1 

48 

52*70 

2 

38 

46-67 

3 

23 

24-11 

4 

10 

8-00 

5 

3 

1-77 

6 

1 

0-25 

7 

1 

0-03 

8 

0 

0-00 


II 

t—< 
Oi 

0 

159-98 


Tbe values obtained by fitting a binomial series are also shown. 
It will be noticed that tbe observed series differs from tbe binomial 
in having too many groups with 4 or more diseased plants and no 
diseased plants, and too few with numbers of diseased plants from 
1 to 3 . ’ In this case x 2 as calculated from (4c) above, is 225*55 with 
159 (= If — l) degrees of freedom. Hence V2x 2 = 21*24 and 
V2N — 3 = 17*86, the difference being 3*38 with unit standard 
error. The difference is definitely significant, tbe 1 per cent, point 
being 2*326, as only one tail of tbe normal distribution is being 
considered. The ordinary x 2 test on this data gives x 2 = 7*967 
with 3 degrees of freedom, which is just significant at the 5 per cent, 
point. 

Tbe binomial series test as applied above takes no account of tbe 
relative positions in tbe field of tbe groups of diseased plants. It is 
on this account advisable to supplement tbe test by performing, where 
possible, an analysis of variance on tbe numbers of diseased plants 
in tbe groups. In a rectangular field tbe variation may be analysed 
into rows, columns and remainder, which is taken as error. This 
enables tbe significance to be tested of any type of gradient of in- 
fectivity which can be expressed as a component of row or column 
variation, such as, for instance, a steady increase in infection from 
one side of tbe field to tbe other. Similarly, a test whether tbe inci- 



54 


Cochran —The Statistical Analysis of 


[No. 1, 


deuce of infection is higher in border plots tban in interior plots may 
be made by dividing the plots into border and interior plots, and 
testing the difference between the mean infection per plot of the 
two sections against the pooled variation within sections. In these 
tests the exact ^-distribution will not be followed, since we are dealing 
with data from an approximately binomial, and not a normal, 
distribution, hut investigations have shown that the analysis of 
variance is applicable over a fairly wide range of non-normality. 

After removing a component in the analysis of variance, such as 
the variation between rows or that between border and interior plots, 
it is possible to make a yj test on the residual, or intra-class, varia¬ 
tion. If there are h rows and l columns and r HV is the total number 
of diseased plants out of n in the plot in the wth row and vth column, 
the quantity 

„ v ( r • )^_ 

1 1 

where 7 U S r in 

A't^l 

will be distributed as yf with h(Jc — 1) degrees of freedom, provided 
that each plant in a given row has an equal and independent chance of 
becoming infected. The use of this type of test in conjunction with 
the analysis of variance enables the experimenter to distinguish 
whether, in addition to some regular variation in percentage of 
infection over the area, there is a tendency for diseased plants to 
congregate in patches. 

Both types of test were made on the data used in the example 
above. The results of the analysis of variance are shown below : 



d.f. 

Sums of squires. 

Mo'wisquuw. 

Rows . 

7 

21-00 

3-000 

Columns. 

19 

36*62 

1-927 

Remainder . 

133 

243-63 

1-832 


The mean square for rows is considerably, though not significantly, 
above the error mean square. There is, however, some suggestion in 
the data of a regular increase in percentage of infection from the 
upper to the lower half of the field, and the row regression degree of 
freedom corresponding to this effect has a mean square 8*702, and 
is therefore significant. There seems no evidence of any change in 
the percentage of infection from column to column. 

A value of x 3 was calculated from the variation within rows for 
each row, to see whether the deviation from a random distribution 
could be accounted for by a variation in percentage of infection from 





1936] 


Field Counts of Diseased Plants. 


55 


row to row. The values obtained, each with 19 degrees of freedom, 
were, 15*902, 28*077, 29*193, 18*348, 27*887, 18*219, 42*500, and 
24*156. Since the 5 per cent, point is 30*114, only one of these is in¬ 
dividually significant. The sum of the values is 204*582 with 152 
degrees of freedom and is itself significant. The conclusion from 
these tests is that, in addition to a gradual increase in the degree of 
infectivity from the top to the bottom of the field, there is a tendency 
for diseased plants to congregate in small patches. 

The Distribution of Groups of Diseased Plants in a Row . 

Instead of examining whether groups of diseased plants in a small 
area tend to occur together more often than on the hypothesis of 
random distribution, it may be of more interest to consider groups of 
diseased plants taken along or across the rows. This would be the 
case, for instance, where the disease might be spreading from a plant 
to its neighbour in the row by mechanical or root contact, or by an 
infective insect crawling from one plant to the next in the row and 
retaining its infective power. The x 2 test by means of the binomial 
series may be carried out as above. Where it is desired to test, 
however, whether groups of contiguous, and not merely neighbouring, 
diseased plants in a row, occur together more often than on the 
hypothesis of random distribution, a more sensitive test may be 
obtained by considering the distribution of runs of diseased plants 
in a row, r consecutive diseased plants which separate two healthy 
plants constituting a run of length r. 

If there are n plants in a row and these have an equal and inde¬ 
pendent chance p of becoming infected, the expected number of runs 
of r diseased plants is 

f _ /2j> r J + (« — r — 1 )p r q* . . . 1 ^ r £ n — 1\ , r , 

( pr /’ (0) 

This formula was given by Marbe, <2) who used it to examine runs 
of births of the same sex in records of vital statistics, runs of heads 
and tails in tosses of a coin and runs of red and black in roulette. A 
proof of the distribution was given by Marbe (3) (p. 9), but this appears 
to omit essential steps. An alternative proof ma} T , however, be 
obtained by induction. For it is easy to verify that the formula 
holds for w = 2 or 3, and by noting that the 2 n 1 possible con¬ 
figurations in a row of length {n + 1 ) may be derived from the 2 ft in 
a row of length n by prefixing to each configuration a p and a q the 
following relation may be derived : 

/«+ i.r =/n,f + ft • • • 0 ^ r n — 1 
from which the induction follows at once. 



56 Cochran —The Statistical Analysis of [No. 1, 


If all runs are considered, irrespective of whether they are healthy 
or diseased, the corresponding formula is 

f __ f2{p r q -f j>q r ) + (>!-;•-1)(p r g 2 + ff) l<.r^.n-l\ (Q) 

r [ / p r + q r t = n )'' 

Marbe used the series in testing a theory of his that long runs 
occur in practice less frequently than is to be expected on the hypo¬ 
thesis of independent and equal chances at each trial. His method 
was to place the observed and expected series side by side, and decide 
by inspection whether the data supported his hypothesis—that is, he 
made no objective test, such as the yf test. It is interesting to 
notice, however, that if in fitting the expected series we take p, the 
only undetermined parameter, as the observed fraction of diseased 
plants in the field, the totals of the observed and expected sets of 
frequencies will not in general coincide, since the total number of runs 
is in this case a variable quantity whose sampling distribution may 
be found on the hypothesis we are considering. Thus the first condi¬ 
tion for the application of the x 2 test—namely, that the observed 
and expected frequencies should have the same total—is not here 
satisfied, and it is not at first sight clear whether the x 2 distribution 
will be followed. It is, in fact, possible to show that x 2 , as calculated 
in the usual way from a set of rows each of length n, is distributed as 

h x i + ^2 2 + * ■ • +W* 

where rr 2 . . . are independently and normally distributed with 
unit standard deviation, but the A { are not all equal to unity. This 
may be illustrated in the simplest case, in which the observed data 
consist of N rows of length 2 . It will be assumed that no distinction 
is being made between healthy and diseased plants. If X denotes a 
diseased plant and • a healthy plant, the data may be divided into 
two mutually exclusive configurations : 

(1) X • or • X with probability 2 pq = ~ l . 


(2) X X or • • with probability p 3 + q* = 

The probability that of N rows are of type (1) and x z of type (2) is 
given by the binomial term 


1 ®i!x 2 ! 

If N is large, we may replace this 
distribution 



(cf. C4) ) by the corresponding normal 


, 4m 
/= ce l 


4- "*s > \ _ 

^ S dT 


(x s - m.y 
ra x m. 


where ar x = 0 and dr is the element of area. 


Each 



1936] 


Field Counts of Diseased Plants. 


57 


configuration of type ( 1 ) gives two runs of length 1 , and each con¬ 
figuration of type (2) gives one run of length 2 . Thus there are 2x 1 
runs of length 1 and x 2 of length 2, so that 


2 (gi - w?i ) 2 | (r 2 - w 2 ) 3 _ ^ 


m 9 


\<>{2 
wh)- (— 
\m 1 


+ 


m 9 


But / may be written : 


-4(.(— + d-\ 

Ce m »' dx. 


Hence x ' 2 = X 2 ^ “ H) 

Thus x 2 varies between jx 2 and ^X 2 * so that the use of the x 2 
distribution as a test would considerably over-estimate the significance 
of any departure. 

The disturbance is due to the fact that the runs of length 1 are 
only independent when grouped in pairs, since to any run of length 
1 there must correspond another of length 1. 

The same method of attack may be used in rows of length 3, 4, 5, 
etc., and also when diseased plants alone are being counted, but the 
evaluation of the values of the X b rapidly becomes laborious. With 
long runs it is possible to show that the significance levels of x ' 2 
are higher than those of x 2 - 

The data considered by Marbe differed from those with which we 
are concerned here in that, instead of having, say, 96 rows each of 
length 15 , he had in all his examples one single long row. In this 
latter case a simple and satisfactory solution of the problem may be 
found by a slight change in the specification. Suppose that the 
method of obtaining a distribution of runs is to proceed along the 
row until a fixed number N of runs has been secured, and not, as in 
the f n> r , series, until a fixed length of row has been covered. This is 
equivalent to regarding the observed total number of runs as ancillary 
information. Suppose that the long row consists of diseased and 
healthy plants, and that runs of diseased plants are being taken. 
To obtain the frequency distribution we proceed along the row until 
a diseased plant is reached. The probability that the next (r — 1) 
plants are diseased and the next after is healthy is ^ r ^ 1 g r , and this 
gives a run of r diseased plants. Hence 

f r = Npr~\ r= 1, 2 , . . . 

so that the probabilities follow a geometric series law. 

If both diseased and healthy plants are being counted, the 
corresponding expression is 

f r = N(p r q + pqr) r= 1,2, .. . 

In dealing with a single long row, the difference between Marbe’s 



58 


Cochran —The Statistical Analysis of 


[No. 1, 


series and the geometric series is of academic interest only, since the 
latter is the appropriate series to use and is easily handled. In the 
present case, however, the geometric series can be applied only if the 
rows, which contain 15 plants each, are considered as joined end to 
end to form one continuous line. Since, however, no test has been 
found from Marbe’s series which would be arithmetically simple to 
apply, this course has had to be taken. 

A test of departure from independence in the geometric series may 
be obtained by estimating f from the distribution of runs of diseased 
plants (which, it will be noted, takes no account of the number of 
healthy plants), and comparing this with the observed fraction of 
diseased plants. The probability of obtaining x r runs of length r, 
r = 1 , 2 , . . . h, from N runs is 


x ± \ x 2 \ 




Hence 


l(r— l)x r log p + Nlogq 


cIj = S(r- l)x, __ N 
?P “ P <1 


so that the maximum likelihood estimate of p is 


That this quantity is an estimate of f may be seen alternatively by 
considering that a run of length r occurs if the first (r — 1 ) inde¬ 
pendent trials following the diseased plant at the beginning of the 
run all give diseased plants, and the next trial gives a healthy plant. 
The whole distribution or runs of diseased plants thus represents 
£x r independent trials, 2 (r — l)x r of which have each given a diseased 
plant. It is also clear that, for a given total number Srx, of diseased 
plants, p is greatest when there are many long runs. Thus the 
amount by which p exceeds p, the observed fraction of diseased 
plants, is a test of the association of diseased plants in groups. The 


variance of p , obtained from the average value of 


d*L • V? . 
gjjy, IS so that 


we may compare p — p with its standard error 



In applying the test in practice, the effect of the assumption that 
the rows may be placed end to end must be considered. In general, 
the effect will be slightly to diminish the sensitiveness of the test, 
since we regard certain plants as contiguous which are not really so. 
If, however, there is higher infection on the edges of the field, it is 
important that these be kept aside when making the test, otherwise 
long runs may occur in joining one row to another, and yet be due 




1936] 


Field Counts of Diseased Plants . 


59 


entirely to the higher incidence of infection near the ends of the 
rows. A variation in the incidence of infection would, of course, 
affect any such test, whether it involved joining rows together or not. 

The binomial series and geometric series tests, both made along 
the rows, would be far from independent, but there are differences in 
the types of departure from randomness which they detect. The 
geometric series test is most appropriate where the hypothesis is being 
studied that departure from randomness arises through spread or 
carriage of the disease from a plant to its direct neighbours in the 
row, and not merely to a nearby plant in the row. As an illustration 
of this, if X represents a diseased plant, • a healthy one, the two 
configurations • X X • X • X and • • X X X X • would count as 
the same in a binomial series test (if lying in the same set of n 
plants), but the latter would give much more weight to the hypo¬ 
thesis of departure from randomness in the geometric series test. 
Where it is thought that the disease is being carried from a plant to 
the next in the row, or that groups of contiguous plants are becoming 
diseased at the same time, the geometric series will be more sensitive 
than the binomial, and is the more natural and appropriate test. 
Otherwise, however, the binomial series is to be preferred, since the 
data for its application can be abstracted much more quickly. 

In the data used in the example above, the binomial series test 
made on groups of 9 plants showed that the diseased plants were 
scattered in patches over the area. The significant result when 
making this test might have risen from the fact that groups of 
diseased plants in a row tended to occur together, i.e . the grouping 
might be associated with rows and not with areas. The geometric 
series will therefore be applied to groups of’diseased plants taken 
along the rows. The distribution obtained is given below, the 
expected geometric series distribution being included for comparison. 


r. 

Observed j. r . 

ExpuL t< d f,. 

1 

164 

169-5 

2 

33 

30-7 

3 

0 

5-6 

4 

1 

1-0 

5 

0 

0-2 


207 

207-0 


In this case there were 261 diseased plants in the field out of 1440 , 
so that p = ^ = 0-18125. 

p — p — 0-1)257 ± 0-0244 



60 


Cochran —The Statistical Analysis of 


[No. 1, 


The difference is not significant, so that grouping would appear 
to be associated with areas, and not with rows. This opinion is 
strengthened by making the geometric series test on lines taken at 
right angles to the rows. In this case p also exceeds p, but not 
significantly so. 

A Test of Significance of Neighbour Infection . 

The geometric series test, as presented above, is intended as 
a general test to detect the occurrence of runs of contiguous diseased 
plants and is suggested on common-sense grounds only. A discussion 
of the appropriateness of the test in particular cases would, however, 
appear to be unnecessary here, since if it is possible to specify 
mathematically the type of departure from the geometric series 
distribution produced by any mechanism, an efficient test for this 
type of departure may be made by the method of maximum likelihood. 
A common method by which groups of contiguous diseased plants 
may occur is by spreading of disease from a plant to its direct 
neighbours, either by mechanical or root contact or through the 
medium of the soil; and it is worth while attempting to derive a 
sensitive test for this type of spread. 

To obtain such a test, it is necessary to examine the effect on the 
geometric series distribution of a spreading of disease. We suppose 
that in the first interval each plant in the field has an equal and 
independent chance p of becoming infected, and that in the second 
interval a plant which is next to a diseased plant has a probability 
s of being infected by it, healthy plants which are not next to a 
diseased plant remaining healthy. It follows that if a healthy plant 
is between two diseased plants, its chance of remaining healthy is 
(1 — s) 2 , so that its chance of becoming diseased is 2s *— s 2 . The 
probability F r of a rim of r diseased plants will now be a function of 
n 9 p, and s. The exact solution has not been found, but approximate 
solutions neglecting any power of s may be obtained. A proof of the 
first approximation, which neglects s 2 , will be given. Further ap¬ 
proximations may be obtained by the same method, but are rather 
cumbersome. 

As in the proof of the geometric series already given, we suppose 
that the plants are arranged in an endless line, and that we proceed 
along the line until N runs have been obtained. It will be con¬ 
venient to include conventionally runs of length zero, each healthy 
plant, except those which conclude a run of diseased plants, being 
considered as a run of length zero. This convention has the advantage 
that, when we are considering the probability of any run, it may be 
assumed that the plant immediately preceding the run is healthy. 
Thus at any point in the line there will be a run of length zero if the 



1936] 


Field Counts of Diseased Plants . 


61 


plant following was healthy at the end of the first interval and has 
not since been infected by a diseased plant following it. The 
probability is 

q{q + p(i ~ *)} = s(i - P s ) 

If x represents a plant originally diseased, • a healthy plant and 
+ a plant which has been infected by a neighbour, these cases may be 
represented as 

• • or • + • X 

Probability q 2, qp(l — s) 

For a run of length 1 there are two cases 

X • • or X • -f X • X 

Probability pq 2 (l — s) p 2 q{ 1 — 2 s). 

The sum is pq{ 1 — s(l + p)} 

For a run of length r there are two similar cases, with a total 
probability p r q{ 1 — s(l + p)} 

A rmi of length r may also arise in the two following ways 
+ XXi i iXiorXXM t X+ • XXMl + MX» 
Probability 2p 7 " 1 ^ ,2 s 2(r — 2 )p r ~ 1 q 2 s. 

The factor (r - 1 - 2) in the last term arises because the + may occupy 
any of the (r — 2 ) interior positions in the run, and the probability of 
infection is here 2 s (more accurately 2 s — s 2 ), since there is a diseased 
plant on both sides. Any run which has two or more + terms may 
be ignored, since its probability will contain a factor s 2 . The total 
probability is therefore 

F r — p r q{l — s(l + 2 >)} + 2(r — 1 )p r ^ 1 q 2 s v > 1 

Fo = 2(1 - sp). 

A test of significance of the deviation of s from zero may be 
obtained by estimating p and s by the method of maximum likelihood, 
which also provides the sampling variance of the estimate of s. The 
likelihood function is 


Hence 


L = x 0 log q + x 0 log (1 — ps) + Srr r log p + log q 

x 1 

+ | x r log jl — s(l + p) + 2 (r — l)^~j 

2Z r s 


op p q 


where X = is the total number of diseased plants, N is the total 



62 


Cochran —The Statistical Analysis of 


[No. 1, 


number of runs and X t = S(r — 1 )r, . The equation of estimation of 
1 

(f-fM**? 1 ) ••••«> 


If* = 


: 0 . this reduces to - = ^ or p = ^v- y, 
’ q N r N + A 


the observed fraction 


of diseased plants in the field. 


'f t = - j»o(i + r) - If,(1 + J>) - A',«n + ft + 1JC, | (1 + 1) 

p p 2 


where N 1 = '£jc r , X 2 — S(r — l) 2 .r r . 

i i 

This gives for s 

(si, «-»,-*») 

=-.(tI,i-iI,i(I+ P ) + ,V 1 (H- J ,)' + r >/.J . . (9) 


The two equations ( 8 ) and (9) may be solved without much labour by 
interpolation, substituting trial values for p in both equations, 
calculating the two values of s and interpolating for their difference. 
The solutions will be denoted by p and 5 . 

Further, when s — 0 


c 2 L__ X X c 2 L _ 2X 1 AT 
cp 2 F q 2 ’cscp p 2 s 

->2 T 2 

& = - fx o - A\(1 + v ? + IX , q - (1 + r) - ix^ 

and these have respectively the mean values 

~M V ~ f ( 3 ~ ^ ^ A ' (4 + 

Henc e I T (.s) = iV{4 + JJ? _ M3 _ pf] ~ Nq{l + q) v 

As an example of the application of the test, it is made below on 
groups of diseased plants at the first count, taken along the rows. 
The geometric series test has already been made on this data, and the 
observed distribution is shown on p. 59. For this data 

X x = 54, A 2 = 78, Nj = 207, X = 261, 



63 


1936] Field Counts of Diseased Plants. 


and the equations of estimation may be written : 

261 — 1440^ = q/ps(K)S + 1179p 2 ) 

108 — — 207 — 1179jj 
P 

= s{Z\2(q[p) 2 - 216j/p(l + p) + 207(1 + pf + 972p 2 } 

The solutions are p = 0*1733 i — 0*0167. 

The standard error of § when 5 = 0, is 0*0178, so that the value of § 
is not significant. 

The Analysis of Later Counts . 

In considering the second and later counts, the experimenter may 
wish to study the distribution of plants which have become diseased 
since the last count, and its relation to the distribution of diseased 
plants at the beginning of the interval. In considering the first of 
these questions, plants which were previously diseased are ignored. 

The binomial series test and the analysis of variance may bo 
applied to the same small areas as were used for the first count, if 
these are considered suitable. In this case the total number n of 
plants in the small areas, ignoring plants diseased at the beginning of 
the interval, will no longer be constant. If there are n k plants 
remaining in the ftth area, k = 1, 2,.. N, of which r k become diseased 
during the interval, y} is calculated from the formula (3) above : 

2 __ V ill ~ W *jP)* 

x i-i >hPi 


As will be expected, the test takes considerably longer to apply in 
practice than the usual test with equal values of n. 

The analysis of variance should in this case be made on the percen¬ 
tages of infection in the area, giving to each percentage a weight 
n k . Where the data are considered as classified in only one way, such 
as, for example, when we wish to compare the variance between rows 
with the remainder, or the difference between the percentage infec¬ 
tions of border and interior plots, the analysis is easily made. For 
if n l} , i\j are the totals of all plants and of diseased plants respectively 
in the ijth. area, and N t — Zn tJ , R t = summed over the ?th 

3 J 

row, and N = SiV t , R = Si?*, we have the identity 

i % 


< H)‘ 


(Ri_Ry 
\n, n. 


- fT + - N ‘ 

n lJ *▼/ W l f j t / 

where the summation extends in each case over all areas. From this 
it follows that the variance between rows may be calculated as 


' R , R 



64: 


[No. 1, 


CocHRAisr — The Statistibal Analysis of 

. S* 

If, however, ther$ are two criteria of classification, such as into rows 
and columns, and we wish to take out the variance due to each, a 
rigorous solution can be made only by fitting constants. For a 
discussion of the problem and of some useful approximate methods, 
reference should be made to Yates (5) and Snedecor. (6) 

An approximate analysis may, of course, be made by ignoring the 
differences in weight of the percentages of infection in the different 
areas. This is satisfactory only if the variations in n& are small. 

The binomial series test and the analysis of variance were made 
on the data at the second count on the same set of small areas as 
used before. The value of y} for the binomial series exceeded its 
mean value, but not significantly so, and the analysis of variance 
showed no sign of any regular gradient of infection. 

Similar considerations apply to the binomial series test made along 
the rows. The geometric series test, however, remains unchanged 
when we regard the plants which have not been ignored as lying in one 
long row. Two plants which are separated only by plants which have 
been ignored, have, however, to be considered as contiguous, and for 
this reason the test loses much of its appropriateness and does not 
appear to be of much interest. 

To pass to the relation between the above distribution and that 
of plants previously diseased, the chief question of interest will be 
whether the incidence of infection in the last interval is higher in the 
neighbourhood of a plant previously diseased. Where the binomial 
series test has been made as described above on plants diseased during 
the last interval, information on this question may be gained by 
comparing the percentage of infection in each small area during the 
last interval with the percentage previous to the beginning of the 
interval. Since these sets of figures will both be available, it is 
easy to examine by an analysis of covariance whether there is any 
apparent connection between them. In the data analysed above no 
relationship was found. 

A simple and effective test on this point may, however, be made 
by means of a 2 X 2 contingency table. This depends on a classi¬ 
fication of all plants which were healthy at the beginning of the last 
interval, into those which were in the neighbourhood of a plant 
previously diseased and those which were not so. The actual choice 
of a fi< neighbourhood ” depends on the nature and extent of the 
spread to be expected. If, for instance, the disease is considered to 
be spread by a prevailing wind sweeping along the rows from left to 
right, and it is thought that the disease might have spread from a 
plant to any of the next three on its right in the row, the classi¬ 
fication would be made at this basis. If, on the other hand, it is 
considered that the disease may be spreading by mechanical contact, 



1936] 


Field Counts of Diseased Plants . 


65 


the <h neighbourhood ’’ of a diseased plant would be taken to include 
all plants to which the infection from that plant might have spread 
by mechanical means. The classification having been made, the 
plants in each class are further divided into those which were diseased 
at the end of the interval and those which were healthy then, the 
usual test (cf. Fisher (1) , §§ 21-21.02) for a 2 X 2 table being made to 
determine whether the percentage of infection is significantly higher 
near a diseased plant. 

As an example of the application of this test, the data at the 
second count, December 31, were classified into those from which a 
diseased plant could be reached either by moving one place along the 
row or to the corresponding place in the next row on either side, and 
the remainder, the numbers of diseased and healthy plants in the two 
groups being recorded. The following table was obtained. 


' 

1 

Diseased. 

Healthy. 

Total. 

Near a diseased plant . 

Ill 

441 

552 

Not near diseased plant . 

115 

512 

627 

Total . 

1 

226 

953 

1,179 


It will be observed that the incidence of disease is slightly higher in 
the neighbourhood of previously diseased plants. To test for signi¬ 
ficance. we calculate 

* (110! X 512| - 441! X 115!) 2 X 1179 

r 226 X 953 X 552 X 627 “ & 

The value is not nearly significant. It should be noted that if in 
this case the only type of departure from independence which is being 
taken into account is that in which the percentage of infection is 
higher near a diseased plant, the 5 per cent, value of x 2 is 2*706, and 
not 3*841. This can be verified at once if we remember that the 
quantity y, defined as positive when the proportion of diseased plants 
is higher near a previously diseased plant and negative when it is 
lower, is distributed normally about zero with unit standard error. 
If y 2 = 3*841, then x = ± 1‘960, and this is the 5 per cent, point 
of the distribution of x when both tails are being taken into account. 
Thus the 5 per cent, point of y . 2 is 3*841 only if we are prepared to 
consider that the presence of a neighbouring diseased plant might 
either increase or decrease the chance of infection in the second 
interval. In the hypothesis mentioned at the beginning of this 
paragraph, the 5 per cent, point value of x 2 is 2*706, corresponding 
to the 5 per cent, value 1*645 of the positive tail of the distribution 
of x- 

SUPP. VOL. III. NO. 1. 


X) 





66 


Cochran— The Statistical Analysis of 


[No. 1, 


An advantage of this table is that if it shows a significant departure 
from the hypothesis that the chance of infection of a plant is inde¬ 
pendent of its position relative to previously diseased plants, an 
estimate can at once be made of the chance of infection in the neigh¬ 
bourhood of a plant previously diseased. 

Discussion of Results. 

In the analysis given above, each test has been made in turn on 
the data, without reference to any previous knowledge of the plant 
pathologist about the spread of this particular disease. It is, how¬ 
ever, worth noting in conclusion how the results support such know¬ 
ledge in this case. The analysis on the data at the first count shows 
that there are signs of a gradient of infection across the field and that 
infection tends to congregate in small patches. These results aTe both 
compatible with the fact that the disease is carried by insects. The 
patchiness could be explained by certain sections of the field being 
more attractive or more easily accessible to the insects, or, alterna¬ 
tively, if an insect retained its infective power after feeding on a plant 
and tended to crawl or make a short flight to a nearby plant to feed 
again. Further, in this case it is known that the disease is not likely 
to spread from plant to plant along the rows unless carried by the 
insects or unless infective juice were transferred, as might happen 
during pruning. The latter possibility may be excluded for the 
first two counts, and non-significance in the geometric series test 
along the rows and in the direct test of neighbour infection is to be 
expected. 

In the second count the indications of patchiness were very 
slight, and there was no marked irregularity in the distribution of 
the percentage of infection over the field. No connection could be 
established between the* incidence of infection in the second interval 
and the presence or absence nearby of plants previously diseased, which 
is not surprising in view of the considerations above. In t his case, 
however, even if a relation had been found, it would not necessarily 
mean that disease was spreading from plants previously diseased, 
since the results could be explained on the hypothesis that certain 
small areas attracted the infective insects and were continuing 
to do so. 


Summary . 

The statistical analysis of counts of diseased plants in a field 
or greenhouse is discussed. Tests of significance are presented to 
examine (1) whether diseased plants tend to congregate in patches 
scattered over the area or in groups along or across the rows, (2) where 
more than one disease count has been made, whether the distribution 



67 


1936] Field Counts of Diseased Plants. 

of plants recently infected is related to that of plants previously 
infected. 

A test is also given which is designed to detect the spreading of 
infection from neighbour to neighbour in a row. 

It is a pleasure to thank Mr. J. G. Bald and Mr. F. Yates for 
considerable help in discussion, and the former for permission to use 
the data as examples. 


References. 

(1) Fisher, R. A., Statistical Methods for Research Workers. Edinburgh, Oliver 
and Boyd (5th Edition, 1934), § 19. 

Marbe, K., (trirndfrageu der angewandten I Vahrsch el nl ich ke its reck n u n g und 
theoretibchen Statistik. Munchen, C. H. Beck, 1934, p. 26. 

{3) Marbe, K., Mathcmatische Bcmeikungen. Miinchen, (J. H. Beck, 1916, 
pp. S-9. 

Fisher, R. A., J. Roy. Stat . Soc ., Vol. 85, Part 1 (1922), p. 89. 

Yates, F., J. Artur. Stat. Soc., March 1934, pp. 51-66. 

<6) iSnedecor, G. AY., J. Amcr. Stat. Soc., December 1934, pp. 389-93. 



68 


[No. 1, 


The Square Root Transformation in Analysis of Variance. 

By M. S. Bartlett. 

1. Introduction. 

The analysis of variance lias by now been used in such a vast number 
of problems that, in spite of its wide range of applicability, it would 
be surprising if examples had not occurred where its direct use was of 
doubtful value. Certain of these cases can, however, sometimes be 
more legitimately solved by a suitable transformation of our variate. 
The common occurrence of a type of experiment for which the square- 
root transformation has often been found useful, justifies a closer 
inspection of this particular transformation. Some illustrations of 
the type of experiment referred to are included in the paper. 

2. Theoretical Discussion. 

Just as, in order to stabilize the variance, the logarithmic trans¬ 
formation suggests itself when the standard error of a variate is 
proportional to its mean value, so when the variance is proportional 
to the mean, the square root may be considered. For the mean m 
large, and actually equal to the variance, we have 

^) = ^)(^-) 2 =i .... (i) 

approximately; or more generally if g 2 (x) = A m. 

Not only is it advisable to have constant variance when the mean 
levels of different blocks or groups vary, but it is of some convenience 
when treatment effects have been established to have chosen a scale 
so that a common standard error may be allotted to the treatment 
means. A further advantage is the relationship of variance with the 
nature of the distribution, a correlation of variability with mean level 
often implying excessive skewness. 

From equation (1), we see that the Poisson distribution suggests 
itself as a distribution for which the square-root transformation may 
prove useful (cf. (1) ), and replicated experiments where the results are 
numbers of the Poisson type (and where heterogeneity may exist, so 
that the use of y} is invalid) may often be analysed in this way. 
Since (1) is an approximate formula only, it is of some relevance to 
see how far the variance of \/x for a Poisson variate x is constant, 
where the mean m becomes small. The variance of <\Jx was also for 



1936] 


Transformation in Analysis of Variance . 


69 


comparison calculated for a continuous distribution for which the 
variance equals the mean, 

pccx m ~ 1 e~ r dx .( 2 ) 

In view of the discontinuous nature of the Poisson, the variance of 
Vx -j- l was further found for the Poisson distribution (because of 
its analogy with Yates’s correction c2> for continuity in the yf test). 
The results are given in Table I and Fig. 1. 

Table I. 


Change in Variance with m. 


Moan m. 

Poisson, -v 

Poisson, \'x -f P 

Continuous, ^/r. 

0-0 

0*000 

0*000 

0*000 

0-3 

0*310 

0-102 

0*182 

1*0 

0*402 

0*160 

0*215 

2*0 

n-3'jo 

0*214 

0 233 

3*0 

0*340 

0*232 

0*239 

4*0 

0*306 

0*240 | 

0*242 

0*0 

0*276 

0*243 ! 

0*243 

9*0 

0-263 

0*247 

0*247 

12*0 

0*239 

0*248 i 

0*249 

13*0 

0*236 

0*248 

0*230 


The variance for the continuous curve approaches its limit 
surprisingly quickly. That for V x is reasonably convergent, but 
shows a peak round about m— 1 , which disappears when Vx + \ 
is used. 

Thus above a mean of io, say, V x may be considered, from io 
to about 2 or 3 , Vx + i is preferable. Below a mean of about 2 or 3 
the discontinuous nature of the Poisson distribution has, of course, 
become so violent that any variate is of little use quite apart from 
questions of variance, unless a large number of replications is 
available. 

It is hardly possible to consider very fully here the question of 
efficiency; without specifying too stringently the nature of the 
experiment to be analysed; our efficiency naturally depends, for 
example, on how we choose to define our treatment effects. As a 
rough guide, however, we may note that the square r 2 of the correla¬ 
tion of y'.r or Vx + i with x for a Poisson distribution is high 
throughout. Thus the percentage efficiency 100 r 2 of the total 
^V x m large samples for estimating m , in comparison with the 
sufficient statistic Sar, has a minimum of about 88 per cent, at about 
m = 2 , that of Vx -f | about 96 1 per cent, at the same value. In 
so far as these figures differ, they favour Vx + J, but they have, of 

d2 





70 


Bartlett —The Square Root 


[No. 1, 


course, only an indirect bearing on our problem. For the continuous 
distribution, 2 log x , not is the sufficient statistic. The reason 
for mentioning these figures is that if our data were, in fact, of a 
Poisson kind, and an analysis of variance and a corresponding 
summary of results were based not on x, but on some transformation 
of it, f(x), say, which was believed to correct non-normality or 
deviations in variance, we want to have some confidence that the 
treatment means given in terms of this new variate do adequately 



Fig. 1. —The three curves (the largest variance first) are for 
/l. Poisson, \'x , 

\ 2. Type III, yar, 

13. Poisson, \ f x + 

summarize our data, and this they could not do if their efficiency for 
estimating the true parameter means m on the original scale were low. 

The question of the normality of the transformed variate has 
already been mentioned briefly, and it was pointed out that the 
transformation will probably have made our variate nearer normal 
than before (e.g. s in normal theory approaches normality more 
rapidly than s 2 ). The discontinuous nature of a Poisson variate is a 
danger, but this will only be important for small m, when at least four 
or five replications should be available. If m is very small, but a 
fairly large number of replications is available, it might sometimes be 





71 


1936] Transformation in Analysis of Variance. 

advisable to add, say, pairs of replications before square roots are 
taken.* 

3. Types of Variation in Practice . 

Since in practice (apart from the convenience of allotting a 
common standard error to several treatment means) we use a ^/x 
analysis in place of y} because of the lack of homogeneity of our 
material, the theoretical discussion above must necessarily be 
supplemented by an examination of the stability of the variance 
under actual experimental conditions. An illustration of one type of 
data that occurs is given below, being weed infestation counts in one 
of ten experiments on weed control in cereals. 


(1) 

(4) 

(2) 

(5) 

(3) 

(6) 

438 

17 

538 

18 

77 

115 

(3) 

(2) 

(6) 

(i) 

(5) 

(4) 

61 

422 

57 

442 

26 

31 

(5) 

(3) 

(4) 

(6) 

(2) 

(1) 

77 

157 

87 

100 

377 

319 

(2) 

315 

(1) 

(5) 

(3) 

(4) 

(6) 

380 

20 

52 

16 

45 


Fig. 2.—Poppies in Oats (plants per 3} sq. ft.). 


The variation here is large, and treatment (2) and the Control (1) 
could, of course, have been omitted, if necessary, from any analysis; 
but this example is purposely given to illustrate the frequent stability 
of variation when measured on the square-root scale, even when of a 
heterogeneous character. In this and the other nine experiments of 
the series, in spite of heterogeneity (indicated by standard error 
greater than a half, and differences between blocks, as well as large 
differences between some treatments), there was no evidence that the 
variance of ^ was correlated with degree of infestation. An 
analysis of yh, was made for eight of the experiments, and Va? + l 
for two giving small numbers. The mean error variance for the 
series was 1-62. 

A sufficiently accurate method of examining the stability of the 
variance when either blocks or treatments show small differences is to 
examine the constancy of the range for the different treatments 
or blocks. 


As an example of a similar though slightly different kind of 
experiment, counts per sample area of a variety of c ockchafer larva 
(two age-groups a and b) are shown in Table II. 


* The same procedure is useful when we are testing the homogeneity of 
data by means of ^ 2 . 0 6 J 




72 


Bartlett— The Square Root 


[No. 1, 


Table II. 

Control of Cockchafer harm . 



j 

a 

l 

L 

b 

2 

a 

i 

b 

C 

a 

' b 

L 

| a 

) 

b 

1 

a 

' b 

F 

a b 

a 

a b 

n 

a b 

1 

\ 13 

28 

29 

61 

5 

7 

5 

14 

0 

3 

1 7 

1 

10 

4 

13 

2 

16 

12 

12 

49 

4 

2 

12 

5 

2 

3 

1 6 

3 

5 

4 

11 

3 

13 

40 

23 

48 

4 

4 

1 

14 

2 

2 

1 7 

1 

8 

7 

10 

4 

20 

31 

15 

44 

1 

5 

5 

9 

2 

7 

3,7 

0 

3 

3 

12 

5 

16 

22 

17 

45 

2 

2 

3 

8 

0 

0 

5 4 

1 

6 

1 

8 


The experiment consisted of five treatments in eight randomized 
blocks; the division of the larva* into two groups demonstrated the 
differential effects of the treatments according to age. Unfortu¬ 
nately the count of the larvae in the whole plot as in blocks A and B 
was not continued for reasons of time, and sample quadrat counts 
over a quarter of the plot were carried out in blocks C to E. This 
fact was, however, ignored in the analysis (of Vx -f ^), for reasons 
given below, the figures for the sample counts in blocks C to H being 
recorded directly in Table II without adjustment. The standard 
errors were 0-50 for a, and 0*59 for b, these figures suggesting fairly 
homogeneous random material, in spite of significant differences 
between treatments and between blocks for a (apart of course from 
the artificial block difference A , B versus 0 , D, E, F, G, H). 

The existence of block differences for variation of a Poisson kind 
raises the theoretical question that the information on any treatment 
effect may be different in different blocks. If the probabilities of 
occurrence for two treatments are proportional to p and (1 + z)p, 
e small, then the information on s, as measured by the reciprocal of 
the variance of the error of estimation, is proportional to p, and in k 
blocks proportional to hp?. This would be the information contained 
in the treatment totals if */ 2 were a possible test. If, however, real 
treatment x block interactions existed, we should prefer to define 
our treatment effect as the average for the replicates, giving each 
block equal weight, as in an analysis of variance. The square-root 
analysis seems to provide us with a fairly useful compromise; we 
average our treatment effects for the purpose of the analysis on the 
square-root basis, and note that if we were able to define them as in 
the ideal Poisson case, we should lose very little information thereby. 
It is, as in the previous note on efficiency, hardly possible to consider 
the efficiency of the analysis exactly, but the effect of averaging the 
square roots of the different blocks is (e small) to obtain the amount 
of information of an order proportional to 

(VJ>3 + Vfz + - • • y/Pi ?l l > 





1936] 


Transformation in Analysis of Variance. 


73 


as can be seen if the error of estimation from such a square-root 
average is considered. The information lost we may therefore expect 
in all typical cases to be negligible. Even in the artificial case above, 
where p r and p . 2 are four times p 3 . . . p s approximately, our effi¬ 
ciency is still of the order of 90 per cent. 

It is advisable to point out here that although the variance of 
yr in these and several other experiments has proved stable, this is 
not the only result to be expected. For example, if heterogeneity 
exists in such a way that we can imagine our mean having itself a 
natural variation, we might write our variance as 

m + a 2 (>n), 

where, under certain assumptions, the change in <j 2 (m) with m, both 
for natural field variation and for controlled changes, due to treat¬ 
ment, is proportional to m 2 . Variation appearing to be of this kind 
has, in fact, been observed, and analysed for suitable groups of 
treatments, numbers from obviously discordant plots, such as control 
plots, being omitted from the square-root analyses. In extreme cases 
of this type where a 2 (m) is large compared to m, it is conceivable that 
(for >n of reasonable size) a logarithmic transformation would prove 
of use. 

4 . Variation with Limited Range . 

The type of experiment for which square roots may prove useful 
is of the field type where the entity counted is of a Poisson nature, 
and unlimited. Experiments where variation is of the binomial 
type, and of fixed range—as would, for example, occur if the incidence 
of disease is to be noted in an experiment where a definite number n 
of plants has been sown—will not fall into this group unless the actual 
variation observed occurred at one end of this range. For cases 
where a fair part of the entire range (0 to n) is covered, Mr. F. Yates 
suggested to me that the function might, by analogy with 

<\/x, be considered. For with the binomial, mean m (scale 0 - 1 ), the 
variance is proportional to m(1— m). Hence if / is the function of 
x or t/n to have constant variance, 

g* 1x1-*)]-* 

approximately, or/a sin^y^. 

It is hardly likely that any experiment would be carried out with 
;*< 10 , which we may regard as a lower limit for our binomial. The 
2 

variance of y = -/ was examined for n — 10, and also for a corre- 

7T 

sponding continuous distribution 

p a x a “ X (1 — x ) 8 “ a dx .( 3 ) 



Baetlett— The Square Root 


74 


[No. 1, 


the variance of which is m(\ — m)/10, the mean m being equal to 
a/9. By analogy with \/x, the variance of 


z = - sur 

TC 




10 


was also found, t being the binomial variate, and | being added or 
subtracted according as 5. 

The binomial variances were, of course, computed directly. For 
(3), we note that 


j sin- 1 ^ . pdx = sin r^x Jpdx — J 


2^(1 — x Y l 



and may easily be evaluated for a and (3 (= 9 — oc) integers. Further, 
we have 


f (smr 1 '\/x) 2 pdx = 

(sin- 1 y'r) 2 J*j?£LtJ 



— f sin- 1 ^ . ar*(l — 

•'o 

jpilx 


Since for / = sin-i-y'x, 

iv +2 /= y, 


where D 0 n stands for (d H jd^/x n ) z = 0f we obtain as a Maclaurin series 
in yar, 


/ = 


x- , 3 2 z , 3 2 .5 2 r 


3! 1 5! 


7 ! 


+ 


which can be used to evaluate the second integral. 

The results are given in Table III for a = 0 to 4, the value 
a = 4J being the point of symmetry, and in Fig. 3. 


Table III. 

Change in Variance with in. 


Mean m. 

Binonml, if. 

Brnonu il, z. 

Continuous, y. 

0 

0*0 

0-0 

0-0 

1/9 

0-0170 

0-0076 

0 0100 

2,9 

! 0-0149 

0-0091 

0-0108 

3 9 

0-0126 

0-0081 

0-0111 

4/9 

0-0117 

0-0067 

0-0112 


The curves for the binomial depart considerably from that for 
the continuous distribution. That for y has an expected divergence 
at each end analogous to that for ^/x for a Poisson, that for z, on the 
other hand, shows the expected drop in the centre, due to the use of 
± For larger n, the curve for z will, of course, become more 
stable. 



1936] 


Transformation in Analysis of Variance. 


75 


The “ large sample efficiencies ” of y and z, like those of y'.r and 
+ \ (see previous discussion of these variates on p. 69), remain 
•high, that of y having a minimum of about 91 per cent, at about 
m = 1 / 9 , that of z about 96 per cent, in the neighbourhood of the 
middle of the range. For the continuous curve, S log [x/(l — x)] is 
the sufficient statistic for estimating the mean. 

Of the two variates y and z 9 z would naturally be used if the 
observed limit of the range were near the possible limit; otherwise, 
y would be available. 



Fiq. 3.*—The three curves (the largest variance first) are for 


f 

[ 


1. Binomial ( n = 10), y, 

2. Type I (a + j3 = 9), y, 

3. Binomial (» = 10), z . 


5. Discussion on Binomial Variation. 

The question of how far we shall in practice find it necessary or 
advisable to use y or z is, however, still rather an open one. Since a 
controlled experiment of the type under discussion is often a laboratory 
or greenhouse experiment with homogeneous conditions, we shall often 
find the use of x 3 permissible. If not, with n reasonably large, we 
shall find the direct analysis of the variate valid near the centre of 
the Tange, and the square root, say, near one end. Even for an 
intermediate position, if we are doubtful of the validity of a direct 

* The values of the variance scale along the a 3 axis have inadvertently 
been given 0*005 too high a value. 





76 


Bartlett —The Square Root 


[No. 1, 


analysis, it should be noted that we can theoretically arrive at a 
roughly appropriate square-root transformation. For we have 

cr = np( 1 — p). 

Let p — p 0 + K 

where A small. Then we have 


a 2 = np( 1 - 2 p 0 ) + np 0 2 


approximately, whence the variate Vx + p or better V% + ^ 
where 




»p<r 

l—2^ 0 


will tend to have constant variance. 

It would hardly seem necessary to consider the use of y or z unless 
.none of these methods, applicable to particular cases, was possible. 
The decision to adopt a particular scale is for data involving small 
integral numbers not altogether an easy one, for any practicable 
analysis must be to some extent approximate, and any transformation 
giving more computational labour, or greater obscurity in any 
presentation of the results, will only be worth while if our analysis is 
likely to have become proportionately more exact. 

As a simple example of this type of data, consider the following 
results of an experiment on wheat germination, carried out in pots 
under glass. 


Table IY. 

Number of Seeds not Germinating (out of 50). 


1 

2. 

a. 

i. 

5. 

u. 

7. 

A 

10 

11 

8 

0 

7 

6 

y 

B 

8 

10 

3 

7 

y 

3 

n 

r 

1 3 

11 j 

2 

8 

10 

7 

li 

D 

1 

6 j 

4 

13 

7 

10 

10 


The experiment consisted of four blocks of six treatments (6 and 
7 same treatment). If we attempt first of all to use yf, we have for 
X 2 between the six treatments, 

X 2 (5d./.) = 14-30, 

which exceeds the P = 0*05 level of significance. We notice at the 
same time, however, that: (1) the dummy difference 6 v. 7 gives 

X 2 (ld./.) = 4-03 

which is also significant; (2) although the individual numbers are 





1936] 


Transformation in Analysis of Variance . 


77 


. somewhat too small for y} to he used on. them, comparison between 
A + B v. C + D for each treatment gives 

yfild.f) = 12-77 

which is not quite significant, but large enough, especially in con¬ 
junction with (1), to make us doubt that our treatments value of y 2 
has no contribution arising from heterogeneity of our material. 

Deciding, therefore, that an analysis of variance is necessary, we 
could reasonably, in view of the small differences between blocks and 
treatments, analyse the data as they stand. The value of Fisher’s z 
for treatment differences is found to be 0363, which, with n ± = 5, 
n 2 = 19, is not significant. No further consideration of the data is 
really necessary in this case, but to illustrate the use of the trans¬ 
formation previously mentioned, we note that 

3 

P== 20 


approximately, whence /x = 1-60, and the variate we might choose 
to consider is, say, V z + 2. With this variate, the value of 2 
is 0*390. 


6. Further Notes. 

Another possible type of data occurs in the field when, although 
the number observed, say the number of diseased plants, is of a 
Poisson type, yet the total number of plants in the plots can be 
counted in each case. If the numbers of diseased plants are small, 
and the total numbers fairly uniform, the method would probably be 
found suitable of using y'cr, as before, with a possible use of co- 
variance on total number of plants if worth while. If the total 
numbers are very variable, percentage infection could first be cal¬ 
culated, and this or an appropriate transformation of it analysed; even 
after the calculation of these percentages it might be found that a 
correlation with total numbers existed. 

Finally, the splitting up of treatment effects into main and 
interaction effects has not been considered here. It may happen 
that any further analysis of this kind might most rationally be based 
on a different scale from either the original or the transformed one; 
this would, for example, be so if the data were of the type reducible 
to a complex contingency table (3) if we were prepared .to neglect 
possible heterogeneity. The convention to adopt with interactions 
when heterogeneity exists can hardly, however, be decided until 
further experience of complex data not analysable either as a con¬ 
tingency table or by the direct use of analysis of variance is obtained. 


I am indebted to the staff at the Jealott’s Hill Research Station 



78 Transformation in Analysis of Variance. [No. 1, 

of Imperial Chemical Industries, Limited, for any experimental data 
referred to in this paper. 

7. Summary. 

In spite of the wide range of applicability of the direct use of 
analysis of variance, data sometimes occur which are more validly 
analysed on another scale. The frequent value of the square-root 
transformation for analysing field variation of a Poisson type is 
noted, and this transformation considered in some detail. Examples 
are given of data for which it has been of use. 

Some discussion is also added on more controlled variation of the 
binomial type, and the analysis of this kind of data considered in cases 
where the use of y} is not valid. 

It should, of course, again be stressed that the mofet valid analysis 
of data where the variation is something of an unknown quantity 
cannot be decided a priori without careful consideration of the 
variation that has actually occurred. The theoretical results in this 
paper may, however, be of value in making us more familiar with 
transformations which we sometimes have to consider. 

References. 

1 Mattick, A. T. R., McCIemont, J., and Iiwrn, 'J . 0., J. Dairy Researck, VI 
(1935), 130-147. 

^ Yates, F., J.R.S.S. Supplt ., I (1934), 217-235. 

3 Bartlett, M. S., J.R.S.S. Supplt., II (1935), 248-252. 



1936] 


79 


Tests of Significance in Analysis of Covariance. 

By John Wish art, M.A., D.Sc. 

The method of covariance analysis for correcting an observational 
measure for variations in one or more correlated variables was 
described by Fisher in 1932 <1, J while the use by some workers of 
tests of significance in this connection was put on a firm basis by a 
demonstration of how the s-test should be applied (Bartlett (2) , 
Fisher (3) , E. S. Pearson (4) ). Recently we have had an example in the 
Supplement of the analysis as applied to data with two independent 
variables by Brady <5) . None of these workers, however, has dealt 
with the question of the calculation of standard errors for the treat¬ 
ment means when adjusted for regression, Brady, for example, being 
content to discuss the differences between his corrected means 
without stating a standard error. It is not difficult to remedy this 
omission. The case of one independent variable has been dealt with 
by Wishart and Sanders (e) , and is reproduced below. Even al¬ 
though the algebra is elementary, it seems worth while to put on 
record here the formula for two independent variables as applied 
during the summer at my suggestion to Brady's data by Mr. N. H. 
Carrier. The generalization to any number of independent variables 
should then be clear. 

The only difficulty arises from the fact that the corrected treat¬ 
ment means are no longer independent. Thus, following the 
notation of (6) , p. 54, but dropping dashes, the corrected mean is of 
the form: * 

y — ~ *) 

where b is calculated from the error line of the analysis of variance 
and covariance. The difference, however, of two such means is, 
say: 

{y» - y*) - -*q) .(i) 

and is seen to be composed of two independent parts. The estimated 
variance of the first part is 2 s 2 jr, if each y is the mean of r observations, 
and $ 2 is the error mean square in the analysis of residual variance 
after correcting for regression, while that of b is s 2 JA, where A is the 
sum of squares for the ^-variable in the error line of the analysis. 
Variations in re we do not have to consider, i.e. we may regard x as 
fixed from sample to sample. It follows that the variance of the 
difference in (l) is estimated by the expression 

s a (2/r + (x p - x a ) 2 /A) . 


• ( 2 ) 



80 


Wish art —Tests of 


[No. 1, 


The ratio of the difference (1) to the square root of the expression (2) is 
distributed as t, with n degrees of freedom, n being one less than the 
original number of error degrees of freedom. 

Had the regression been known exactly, and not estimated from 
the data, the estimated variance would have been 2 s 2 /r. This should 
therefore be corrected by multiplying by : 

1 + M-ty “ x q) 2 I a > 

the second term being simply the ratio of the two appropriate sums 
of squares of the ^-variable, having 1 and n + 1 degrees of freedom 
respectively. The correction is often by no means negligible, as 
shown in <6) . 

For two variables we follow the notation of Brady (5) , but dropping 
dashes. The corresponding corrected treatment difference is now : 

{jj/i y<j\ ^1 y) ~b b 2 (x 20 ***2!/)} * * 

The parts in curled brackets are independent of one another, and 
hence the required variance is the sum of the variances of the separate 
parts. That of the first part is again estimated as 26 3 //*, if each y is 
the mean of r determinations, s 2 being the error mean square in the 
analysis of residual variance after correcting for regression, and hav¬ 
ing, let us say, n degrees of freedom, two less than the original number 
before correction. For the second part we have to consider what is 
the variance of a1 1 + yb 2i say, where X and y are constants. Now 
we have: 

6, = {BS&XJ - PSQ/X.J} - (AB 
b 2 = (-4S(i/Z 2 ) - PSfoJI,)} + (AB-F*)I ■ W 

in which A and B are the sums of squares for x x and x 2l and P is the 
sum of products, in the error line of the analysis of variance and 
covariance, while Xi(yX x ) and X(yX 2 ) denote the sums of products of 
y with x x and x 2 respectively, also from the error line (Q and R in 
Brady's notation). The coefficient of any single y in Xb x + yb 2 is 
therefore: 


{(XB - yP)X 1 + (yA - xP)X 2 } + (AB - P 2 ) 


Since the y's are all independent, the estimated variance of Xb x ~j-yb 2 
is seen to be : 


2 

(311=1*)* - n p T- + 2P(XP - nP)0dl - >P) + B(pJ -XP) 2 } 


X°-B - 2X/xP + S-A , 
AB'- P s s 


. (5) 


* X is here written to denote that the expression will usually be more 
general than x — £. Thus for a row-column lay-out X will be oi the form 

*HV ~ • ~ x, „ + x. 



1936] Significance in Analysis of Covariance. 81 

Finally, therefore, we have the following expression for the estimated 
variance of the corrected treatment difference (3): 

6 [7 + —J • • • ‘ (b) 

where X = x lp — x w and ^ = x 2P — x 2q . 

The ?-test follows at once. 

As an example, let us test the significance of Brady’s varietal 
comparison between Victory II and Sandy, (5) p. 105. 

d = 4*722 - 4*189 = 0*533 

s 2 = 0*3135, r = 27, X = - 2*89, p = - 0*27 (Tables IX and XIII) 
A = 142*5, B = 11*0886, P = 9*94 (Table III) 

whence V(d) — 0*04174, s d = 0*2043. 

t = d/sa = 2*61, u = 54. 

For n = 54, P = 0*05,« = 2*005. 

Therefore the difference is significant at the 5 per cent, point. For 
the other differences we have : 

Sonas—Sandy d = 0*597, s d = 0*2749 (significant at the 5 per cent, 
point). 

Sonas—Victory II d = 0*064, s d = 0*2041 ( not significant). 

Note that the largest difference is not the most significant, owing to 
the occurrence of a larger standard error. 

Cases of three or more variables depend obviously on the more 
complicated expressions for the partial regression coefficients. 
Formula (5) when generalized, say, for three variables, will have in its 
numerator a quadratic form in X, n and v, with coefficients which are 
the co-factors of the sums of squares and products of the ^-variables 
in the determinant of these quantities, while the denominator is the 
determinant itself. A formal proof of the general case is appended, 
using the double suffix convention to denote summation. 

Let the dependent variate y and the independent variates 
x v x 2 . . . x 6 be all measured from their sample means. Then we 
seek a relation of the form : 

Y = b0 t (i = 1, 2 . . . >) 

For this we require to minimize S(y — F) 2 , the summation being 
over all the N observations in the sample. 

i.e. S(y — b x x)(y — bjXj) is to be a minimum. 

Differentiating with respect to b t we have: 

Sar*(y — bjxfi = 0 . 



82 


Tests of Significance in Analysis of Covariance. [No. 1, 


Thus we have s linear equations, of the form : 

hj^Tj) = S(;r,y). 

Writing this a t jbj = d n where a tJ = d t = the solution 

is: 

A 

h — — Jl a 
— £ a ) 

where A tJ = A }1 is the co-faetor of a t , in 4 = |« y |. 

Thus, writmg c tJ — A^/A, we have : 

\ = CyS^y) (j = 1, 2 . • • *). 

Consider now the linear form \b l (i = 1 , 2 . . . s). 

We assume that y is distributed normally in each array with a 
variance (a 2 ) which is the same for all arrays. Also the x-variates 
may be regarded as fixed from sample to sample. Then \f> % is a 
weighted mean of the N values of y, each with weight lAjXj, and we 
have: 

(variance of \b t ) /a 2 

= ^(^jXj^uCuiXt) (S over the sample from 1 to N) 

= 'k^k u c i jC ul 'S(Xji l ) 

= lfilial 

= } t \ u c l jA Hl a JL /A ^ 

(= 1 u — j 
— Wfiij&uj* "where j = u ^ j 

= ihj = 1>2 . . . 5). 

The required result follows as in the case of two independent 
variables. 

References. 

Fisher, R. A., Statistical Methods for Research Worlds, 4th Edn., 1932; 
5th Edn., 1934 (01i\er and Boyd). 

2 Bartlett, M. S., Proc. Camh . Phil. Soc , 1934, 30, 164-9. 

* Pearson, E. 8., J. Roy. Statist. Soc. Supply 1934, 1, 178-81. 

5 Bradv, J., J . Roy. Statist. Soc. Supply 1935, 2, 99-106. 

6 Wishart, J., and Sanders, H. G., Principles and Pmctice of Field Uxpin- 

nientation , 1935 (Empire Cotton-growing Coiporation). 



1936] 


83 


CORRESPONDENCE. 

Complex Experiments. 

To the Editors of the Supplement to the Journal of the Royal 
Statistical Society. 

When reading his paper before the Industrial and Agricultural 
Research Section of the Society on May 23rd, 1935,* Mr. Yates defined 
what he meant by first-order interaction of,'say N and K, namely 

N X K = nk - n *- k + (1) • • • (1) 

His second-order interactions, e.g. of N, K and P, were defined as 
follows: 

N X K x P = (npk — nk — pk + k) — (up — n — p- f (1)) (2) 

I contributed to the discussion of the paper showing that in 
the experiments as described by the author the interactions are not 
likely to be detected even if they are very considerable. Since the 
meeting Mr. Yates has changed the definition of the interactions. 
In fact, instead of (1) we find (page 190) 

N X E— -n-k+ (1)) ... (3) 

and instead of (2) (page 194) 

N X K X P=\{\(npk — nk — pk + k) — l(np — n — p + (1))} (4) 

The change in the definition of what is called an interaction does 
not influence my final conclusions, but it does influence the argument 
I employed to the point of making it incomprehensible. Un¬ 
fortunately I was not warned in time, and could not adjust my 
remarks to the final text of the paper. Eor this reason I take the 
liberty of asking for permission to publish the following lines, which 
I hope will help the reader to understand my remarks. 

Any author is at liberty to introduce new terms and to ascribe 
to them any meaning he likes. When judging the work, others may 
use the conceptions introduced by the author if they consider them 
suitable; if not, they may use other conceptions. In the present 
case, when we are interested in the accuracy with which we may 
determine 64 the difference of the responses to k in the presence and 
absence of h,” I do not think that the interaction N x K (as last 
defined by Mr. Yates, page 190, or my formula (3) above), is a 
suitable measure. Neither do I think that the second-order inter- 


* Supplement to the J.B.S.S. Vol. II, No. 2, pp. 181-247. 



Correspondence. 


84 


[No. 1, 


action (as last defined by Mr. Yates, page 194, or my formula (4)) is 
suitable for similar purposes. 

Any effect of agricultural treatment must be referred to a specified 
unit of area. Thus the response to a manure equal to i cwt. may be 
deemed considerable if it is obtained on a plot of moderate size, 
and will be thought negligible if obtained on a field of, say, ioo acres. 
A convenient unit of area in field experiments is that equal to the size 
of plot, and a convenient unit to measure the effects and errors in 
their estimation is the average yield per plot. Therefore, if we are 
interested in the difference, which I shall denote by A,, of the effects of 
k in the presence and absence of n, we have to refer this difference to 
some unit of area, and it is convenient to refer it to the area equal to 
that of the experimental plot. To estimate this difference, A ? , we 
must have at least four plots, one with nk, one with k, one with n 
and one non-manured. For instance, the plots could be those of the 
same block. The elementary estimate of A* obtained from four 
plots of the j-th block will be, say, 

(»*, - »,) - (*, - (l)j).(5) 

If each of the treatments is repeated three times, this yields us three 
elementary estimates similar to (5), and we may usefully calculate 
their mean to get a final estimate of the difference Aj we are interested 
in. Thus if we are interested in the difference in the effects of k 
in the presence and absence of and if we wish this difference to refer 
to the unit of area equal to the area of the expen mental plot, so as to he 
comparable with the average yield per plot, then we should not calculate 
what Mr. Yates now calls an Ck interaction,” but the double of that 
quantity, i.e. we are brought back to his original definition given 
in my formula (1). The interaction as defined in the final draft of 
the paper represents the difference in effects of k in the presence 
and absence of n ” obtainable not from the whole, but from one half 
of the experimental plot. Similarly the new definition of second- 
order interaction represents the particular effect as obtainable not 
from the full-size experimental plot, but from \ of the Fame. 

The factors \ and \ introduced in the definitions are purely 
artificial means of equalizing the standard errors. If the effects 
described as interactions are referred to the common unit of area 
equal to that of the experimental plot, then the S.E. of first-order 
interaction will appear to be double, and the R.E. of second-order 
interaction four times that of the “ main ” effect. 

If we wish to discuss the accuracy of the estimates of interactions 
and to express the limits of errors which are likely to be committed 
in terms of the general mean yield, we should deal with the particular 
effects referring to full-size plots, and not to fractions of the same, 



1936] 


Correspondence. 


85 


and thus keep to the old definitions. This has been done in my 
remarks contributed to the discussion, which I hope now will be clear 
to the reader. J. Neyman. 

[Mr. Yates writes: 

I am sorry if Dr. Neyman feels that my change of definition has 
made his argument incomprehensible. I cannot, however, believe 
that this is the case. A conventional change of units of this kind can 
only cause confusion if there is some misunderstanding as to what 
units each writer has in mind. As I have given explicit indication 
of my change both in a footnote to the text and in my reply, there 
should be no risk of this here. 

Nor do I find myself in any closer agreement with the fresh 
argument Dr. Neyman now puts forward. The factors l and \ are 
not, as Dr. Neyman now states, “ purely artificial means of equalizing 
the standard errors/’ They have the more important function of 
enabling the effects of one factor in the presence or absence of another 
to be deduced by adding or subtracting the appropriate interaction 
from"the main effect. Thus in terms of my new definition : 

the effect of n in the presence of k = N + N X K, 
the effect of u in the absence of k = N — N X K, 

as was stated explicitly in the final draft of my paper; whereas in 
terms of the definition Dr. Neyman prefers : 

the effect of n in the presence of k = N + \(N X K), 
the effect of n in the absence of k = N — \(N X K). 

The practical interpretation of the meaning of interactions, how¬ 
ever defined, rests on these expressions. With them in mind, it 
should be clear that with Dr. Neyman 1 s definition errors in the 
interactions are of half the importance of errors of the same magnitude 
in the main effects. To express both as percentages of the mean 
yield and treat them as if they were of equal importance seems to 
me to be merely misleading.] 




SUPPLEMENT 

TO THE 

JOURNAL OF THE ROYAL STATISTICAL SOCIETY 
Vol. HI., No. 2, 1936. 

Inverse Interpolation 
and 

Scientific Applications of the National Accounting 
Machine 

Bv L. J. Comrie. 

[Being the substance of a lecture given before the Industrial and Agricultural 
Research Section of the Royal Statistical Society, January 30 th, 
Mr. S. P. Vivian, C.B., in the Chair.] 

Inverse Interpolation 

Introduction .—Direct interpolation is a commonplace and familiar 
operation, although many scientists are averse to using second, 
third and fourth differences. For such as these inverse interpolation 
is an evil to be avoided, involving, according to the meagre accounts 
in textbooks, the solution of higher-order equations, reversed series 
or numerous approximations. The method here proposed has been 
in successful use in the Nautical Almanac Office for two years, and 
is described in full (together with direct interpolation) in the Nautical 
Almanac for 1937 , which contains also the tables required in its 
application.* 

Notation .—The notation employed, which seems, on its merits, 
the best for general use,f is as follows: 

Differences 


Function 

First 

Second 

Tlurd 

Fourth 

Fifth 

f-2 

f-1 

*-u 
A' . 

\» 

^-1 

x\ 



fo 

> > 
— l 

yr 

A'" 

AT 

Jl 

fo 

A" 

A" 

i 

A[ v 

i 

J - 

Xi 





h 






Note that 

yr r y/ y _ \r , \iv _ \nr __ \//' 

* Reprints of these tables, and their application, under the title Interpola¬ 
tion and Allied Tables, are on sale by H.M. Stationery Office, price is. 

t The retention of A for obsolete forward differences has led to the use of 
S for central differences. Here A is used for central differences—the only 
differences seriously used—and 8 for the differences of subtabulated values. 
The use of plain numerical indices to denote the order of a difference is open 
to grave objections. 

SUPP. VOL. m. NO. 2. 


E 



88 


Comrie —Inverse Interpolation. 


[No. 2, 


Fundamental Formulae. —For direct interpolation the textbooks 
give formulae due to Newton, Stirling, Lagrange, Gauss, Bessel and 
Everett, of which the last two are most favoured by practical com¬ 
puters. For our present purpose the best formula is that of Bessel, 
which may be written 

fn =fo + + B"(A" + AJ) + B" Af + B 1V ( A" + A?) + ... 

where n is the fraction of the interval and where 


B” 


2 . 2 ! 


B m _ >*(” - !)(”-!) 


DXT _ (w + l)w(« - l)(w “ 2) 

2.4! 


It will be noted that the values of B" and B 1V are half those usually 
given; this allows £(AJ -f AJ), etc., to be replaced by the more 
convenient A£ -f AJ, etc. The formula may be rewritten 

fn - B"(X + Af) - Af - B lv (A l f + Ai v ) =/ 0 + wAJ 

and will be used in this form for inverse interpolation. 

Mechanical Calculators. —Any of the modern calculating machines 
may be used in the process to be described. For work of this 
nature, however, and for non-routine scientific calculations generally, 
the writer prefers the Brunsviga 20 (Fig. I), of capacity 12 x 11 x 20, 
although its chief virtue, the ability to transfer a number from 
the product register to the setting levers, is not immediately called 
for; its ability to count forward or positive turns in the multiplier 
register in white figures, and backward*turns in red figures, without 
the reversal of any levers, is a great asset. For brevity the following 
contractions are used: 


S.L. = setting levers (= keyboard in keyboard machines) 
M.B. = multiplier or revolution register 
P.R. — product register. 


Linear Interpolation .—In order to establish certain principles, 
consider first a direct linear interpolation, e.g. the finding of y when 
x = 6*41967. 


x y 

6 2974063 

7 3042817 


A' 

-r 68754 


The working equation is 






[To face p 88 


-Biuiibvigx Calculating Machine, Model 20 



















1936] 


Comrie —In verse Interpolation . 


89 


Doubtless many computers would first copy / 0 or 2974063. Tlien 
Aj or 68754 would be set on the machine, multiplied by 0.41967, 
and the product (rounded off) written under / 0 . The sum (or 
difference) is the required interpolate. This process is subject to 
the following formidable list of possibilities of error: 

(1) / 0 may be copied incorrectly. 

( 2 ) Aj may be set incorrectly on the S.L. 

( 3 ) AJ, as given in the table, or as formed mentally, may be 

incorrect. 

( 4 ) n may be taken incorrectly. 

( 5 ) n AJ may be transcribed incorrectly. 

(6) Too many or too few figures may be taken in nA[. 

( 7 ) >?Aj may be added (or subtracted) incorrectly. 

(8) n A[ may be taken with the wrong sign. 

These possibilities, with the exception of ( 4 ), can be eliminated. 
Set / 0 on the right of the S.L. and multiply by 1 x 10 v where A T is 
the number of decimals in n ; thus the multiplier is here i-ooooo. 
Without moving the carriage, clear the S.L. and M.B. and set Aj 
on the right of the S.L.; turn the handle forward for a positive AJ, 
and vice versa. Verify carefully that the P.B. contains f v The 
machine now records that 

/1-/0+ 1*00000 a; 

By changing the M.B. to 0.41967, we obtain the desired result, namely 
3002917. If f x has been carefully verified, and if n is correct, the 
result may be deemed to be free from error. 

In an inverse linear interpolation, e.g. the finding of x when 
y = 3002917 , we proceed exactly as before to the stage where f 1 has 
been formed and examined. By turning the handle the P.B. is 
now changed from/ x to f n ; this changes the M.B. from 1.00000 to 
n, i.e. to 0.41967. 

The similarity between the two processes is very striking. Por 
direct interpolation we produce the known n in the M.B. and read 
the unknown f n from the P.B.; for inverse interpolation we produce 
the known f n in the P.B. and read the unknown n from the M.B. 

Inverse Interpolation xvith Second and Third Differences. —Suppose 
we wish to find x when y = 9374 . 


x y 

1-2 9320 

1-3 9636 


A' 

+ 316 


A" 

-92 

-98 


A'" 

-6 


The working equation is now 

+ AO-Zo+hA*' 



90 


Comrie —Inverse Interpolation . 


[No. 2, 


0-14 

•15 

•16 

•17 

018 


185 190 195 n 
6 6 6 0-86 


•85 

•84 

•83 

0-82 


since tlie third difference is here negligible. Ignoring for the moment 
the term — B"(A£ + A"), we find an approximate 2 -figure value of 
n as before, i.e. n = 0 - 17 , with which we 
can enter a two-page table (N.A. Table 
XXIII), giving B "(AJ - Af) with argu¬ 
ments n and AJ 4- A^. The relevant part 
of this table is shown alongside. Here, 
with Aq 4 - A" = — 190 , the correction to 
be applied (mentally) to f n is — 7 . Turning the machine (still with 
a 2 -figure n ) to 9367, we find n = 0 * 15 , which has changed the 
correction to — 6, so that by bringing the machine to 9368 we get 
the final n of 0-152, or ,r = 1 * 2152 . 

In this form the method is limited by the table used (A£ -f A'{ 
goes to 200 only) and by the rounding off of B"(Aq 4- AJ). To over¬ 
come these limitations we may use two machines, devoting one, the 
left-hand machine or L.H.M., to the left-hand side of the equation 
and the other, the right-hand machine or R.H.M., to the right-hand 
side of the equation, i.e. 


L.H.M. 

fn - B''(\'' + a;') - B n 


R.H.M. 


Af - etc. 


= / 0 -r n Aj 


The process consists in securing a balance between the two machines, 
with B ", B n , etc., depending strictly on the value of n on the 
R.H.M. As B m is not very sensitive to small changes in n, it is 
easy to find a value of n that gives a final value of B'", while B" 
can be revised as closer approximations to n are found. 

Reverting to the example, in which third differences are neglig¬ 
ible, the stages required are illustrated and described. Negative 
multipliers (in red on Brunsviga machines) are printed in italic. 


Left-hand Machine Right-hand Machine 


Stage 

S.L. 

M.R. 

P.R. 

S.L. 

M.R. 

P.R. 

1 




9320 

1-000 

9320 000 

2 




316 

1-000 

9636 000 

3 

9374 

1-000 

9374 000 

99 

0-170 

9373 720 

4 

190 

0-033 

9367 350 

99 

0-150 

9367 400 

0 

jj 

0-032 

9367 020 

99 

0-152 

9368 032 


(1) Set 9320 on right of S.L. of R.H.M. and multiply by i-ood 
(as we wish to have three decimals in n). 

( 2 ) Clear S.L. and M.R., set A[, and turn (here forwards) to 
produce f v Check f v 

( 3 ) Turn till the P.R. shows f n as closely as possible with two 
figures in n. Set f n on S.L. of L.H.M. and multiply by 1-000 (as we 
wish to have three decimals in B"). 



1936] 


Comrie —Inverse Interpolation . 


91 


(4) Set A" -r A" on S.L. of L.H.M. With the 
approximate value of n find B" (N.A. Table XIX, 
illustrated alongside) and form — jB*(AJ + AJ) 
which has always the same sign as A" -f A". Xow 
turn the R.H.M. until it balances the L.H.M. as 
closely as possible, getting it = 0*150, for which 
B" =‘-0*032. 

(5) Revise the value of B" on the L.H.M. and equalise again, 
getting n — 0*152. As this has not changed B", we have the desired 
result, namely x — 1*2152. 

As an example where third differences are not negligible, find x 
when y = 31923. 



X 

y 

A' 

A" 

A" A lv 



1*2 

35263 

-10844 

-528 

+ 224 



1*3 

24419 


-304 

+ 2 



Left-hand Machine 


Right-hand Machine 

Stage 

S.L. 

M.R. 

P.R. 


S.L. M.R. 

P.R. 

1 





35263 1-0000 

35263... 

2 





10844 1*0000 

24419... 

3 

31923 

1-000 

31923... 


„ 0*3100 

31901... 

4 

224 

0*007 

31921... 


9 i »» 

» 

5 

832 

0*053 

31877 3... 


„ 0*3122 

31877 5 

6 

jj 

0*054 

31876 5... 


„ 0*3123 

31876 4 


n 

0*1478 

•1535 

•1594 

•1653 

0-1713 


B* 

•032 

•033 

•034 

•035 


B" Stages (1), (2) and (3) are the same as before. 

Then 

(4) Set A" and form the product — B m Af. 

(5) Set AJ + A" and multiply by B", found as 
— 0*053 with argument ;? — 0*31. Equalise the 
machines, getting n =* 0*3122. 

(6) Revise B" to —0*054, and, by equalising 
again, get n = 0*3123, which yields the same B” as 

before, so that x = 1*23123. 

Tliroiv-bacJc. —It will be observed that 


0-3000 

— 

0-3102 

•053 

0-3211 

•054 

n 

B r ' 

0-2735 

+ •007 

0-3210 



Tjxv _ (» + 1)«(« - 1)0? - 2) _ DJf (n + 1)0? - 2) 

2.4! ” 12 

Now 1 V (n + 1) 0? — 2) varies slowly as n varies from o to i. If 
we assign to it the constant value —0*184, Bessel’s formula becomes 

/ rt =/ 0 + «AJ + B"(A'' - 0*184Aq V + a; - 0*184Ai v )-f B m Af 

or, if we write 

IT = A" - 0*184A lv 

calling 21” the modified second difference, then 


/» =/o + + B»{M'' + Ml) + BF Af 



92 


Comrie— Inverse Interpolation. 


[No. 2, 


the error of which is 


(B IT + 0-184 B")(Ai T + Al T ) 


From the values of this coefficient shown 

71 

1Q alongside, it is evident that, if the mean 
0 9 fourth difference does not exceed iooo, the 
0*8 use of the modified second difference M " 
0-7 enables the effect of the fourth difference 
°’ 6 to be taken into account, with an error not 
exceeding half a unit of the last decimal. 
This device, known as the throw-back, is perhaps the most 
valuable contribution that has been made to practical interpolation 
during the past decade. It enables whole columns of differences to 
be dispensed with in published tables, and saves the user the trouble 
of looking out (in general) two coefficients and performing two multi¬ 
plications. It can, of course, be applied to higher-order differences. 
Thus 

M 1V = A 1V - 0-207 A Vi + 0-045A vm 

may be used up to A ' 1 = io,ooo, the second term being negligible if 
A rin is less than 200 . 

I it verse Interpolation with Differences up to the Fomth .—Consider 
the finding of x when y = 91234. 


X 

y A' 

A" 

A'" 

A ,v 

A v 

0-2 

91817 ftAW/v 

J-1248 


+ 100 


0-3 

89747 ~- j07Q 

+ 1049 

-199 

+ 68 

-32 

Here Ml 

+ Ml = +2297 - 

- 0-184(168) - +2266. For 

a published 

table the following entries would suffice 




X 

y 

M" 




0*2 

91817 

+ 1230 




0-3 

89747 

+ 1036 



For A! 7 we could use Ml — 

1 

II 

194, which overcorrects for the 

fifth difference, but is actually slightly better than A*' itself. 


Left-liond. Machine 


Right-hand Machine 

Stage 

S.L. M.E. 

P.B. 

S.L. 

M.R. 

P.R. 

1 



91817 

1*0000 

91817... 

2 



2070 

1-0000 

89747... 

3 



»» 

0-2800 

91237... 

4 



99 

0-2200 

91362... 

5 

91234 1*0000 

91234... 

99 

0-2800 

91341... 

6 

199 0-0080 

91236... 

99 


99 

7 

2266 0-0443 

91336... 

99 

0-2321 

91336... 

8 

„ 0*0446 

91336 7. 


0-2321 

91336 6... 


71 B ir + 0-1843" 
0-0 0-00000 

0-1 -0-00022 

0-2 -0-00016 

0-3 +0-00001 

0-4 j- 0*00016 
0-5 +0-00022 





1936] Comrie —Inverse Interpolation . 93 

Extracts from N.A. Tables XXIII and XXIV are shown below. 


n 

110 

115 

120 

n 

n 

B" 

n B m 

0*22 

5 

5 

0 

0*78 

0*22981 

- 

0-18976 

*23 

5 

5 

5 

•77 


•0443 

+ •0080 

•24 

5 

5 

5 

•76 

•23055 


0-23343 

*25 

5 

5 

6 

•75 


•0444 


*26 

5 

6 

6 

•74 

•23129 



•27 

5 

6 

6 

•73 


•0445 


0*28 

6 

6 

6 

0-72 

•23204 

•0446 







0-23279 




Stages (1), ( 2 ) and (3) are the same as before. Then 

(4) In order to obtain a preliminary value of n that will give a 
final value of B 1 *, we get an approximate value of B"(Aq + A") 
from Table XXIII, taking 2266 as 20 x 115. This approximate 
value is 6 x 20 = 120 , so the R.H.M. is brought nearly to 91234 -f 
120 . Note the large change in u, which is due to the fact that 
Aq t A" is large as compared with A*. 

(5) The revised correction, with argument 0*22, is 5 x 20 = 100, 
leading to a revised n of 0*23. Set/ n on the L.H.M. as usual. 

( 6 ) On L.H.M. set A [ 7 and multiply by B m . 

(7) On L.H.M. set 3JJ + M" and multiply by B n . Change 
R.H.M. so that it balances L.H.M., getting n = 0*2324. 

( 8 ) Revise B" and balance again. The change in n has not 
affected B", so x = 0*22321. 

Stage (4) may often be omitted, or a rough mental estimate of 
B"(Aq + A") made without consulting any tables. This example 
has been deliberately chosen to illustrate a case where the second 
difference has a large influence on n. 

Limitations of Method .—The method ceases to be practical when 

( 1 ) Tables with sufficient decimals in B" } B m and B iy are not 

available. 

( 2 ) The preliminary value of n fails to give B m and B lv with 

sufficient accuracy. 

The N.A . Table XXV gives, at interval o*ooi in n , values of B n > B m 
and to 6 , 5 and 4 decimals respectively; these suffice to the 
point where limitation ( 2 ) becomes effective. Then, as in all practical 
methods, the original table is subtabulated over the range required 
to tenths, hundredths or thousandths, using one of the many tables 
of Everett coefficients available; the method is then applied to the 
subtabulated values. 

Conclusion .—The objection that two machines are required is 
valid only with isolated computers, and cannot be sustained by a 
computing establishment. Such an establishment, if its work 



94 


Comrie— Scientific Applications of 


[No. 2, 


involved much inverse interpolation would doubtless be willing to 
pay £150 for a special machine that would give results readily; the 
two machines that can be bought for this sum offer greater flexibility 
when other work is being done. 

It is no exaggeration to say that inverse interpolation involving 
higher-order differences has been shirked in the past because of its 
tediousness. It is hoped that this method will remove any excuse 
for not inverting tables when the inverted form is more convenient 
for the final user. 

Scientific Applications of the National Accounting Machine 

Introduction .—There is a distinct tendency on the part of scientists 
to design machines for their computing problems. In most cases 
only a few models of the machine are made, so that it does not become 
generally available. The present writer has made a point of 
examining closely the many excellent machines made on a large 
scale for accounting and other commercial processes, with a view 
to adapting them—with little or no change in construction—to 
scientific computing. This has the advantages that the machines 
used are the product, not of a single and perhaps not too experienced 
designer, but of groups of experts; that spare parts and expert 
service are readily available; and, above all, that others interested 
may purchase the machines at a moment's notice, and at prices 
that are economical as compared with the overhead costs of design 
and construction on a small scale. 

The story of Babbage's attempt more than a century ago to 
construct a difference engine is well known. All that remains for an 
expenditure of £17,000 of public money is an incomplete section 
in the Science Museum. The two Scheutz machines, inspired by 
Babbage's, have long since become museum exhibits. The special 
Mercedes machine that produced Bauschinger and Peters' 8-figure 
tables is lost. The machine to be described may be called a modern 
Babbage machine; it does all that Babbage intended his difference 
engine to do, and more. At a cost of £5oo-£6oo,* it is not beyond 
the means of institutions where extensive computing is undertaken. 
It was first used by the British Association Mathematical Tables 
Committee, then in H.M. Nautical Almanac Office, and, more 
recently, by the National Physical Laboratory. 

Description .—The National (Fig. II) is an adding machine, with 
a 12-column keyboard, six adding mechanisms or registers, printing 
sectors, and a movable carriage. Two of the registers (Nos. 1 and 
3) will subtract as well as add. The fundamental property of the 

* The cost is less for a four-register machine, or for fewer keyboard columns, 
or for a machine with one subtracting register only, but the fullest capacity 
and all the features mentioned are recommended. 




Wo face p 


^atjooal Aca 








■ Keyboard and Controls 




1936] 


the National Accounting Machine . 


95 


machine, on which its scientific usefulness depends, is that a number 
set on the keyboard may be entered into any register (positively or 
negatively in Nos. i and 3) or directly into any combination of the 
registers. Such a number is automatically printed during this 
process. The contents of any register may be totalled, i.e. printed 
with clearing or zeroising of the register, or sub-totalled, i.e. printed 
without clearing the register, and at the same time transferred to 
any other register (again positively or negatively into Nos. x and 3) 
or to any combination of the other registers. 

The registers that are to be active in any position of the movable 
carriage are determined by stops mounted on a form-bar on the 
front of the carriage. These stops possess, on their lower sides, 
small projections that depress the protruding ends of eight levers 
(including those for subtraction), known as hanging-bars. Stops 
are available (or can be made) with every possible combination of 
projections, so that any desired combination of registers can be 
brought into use in any given carriage position. The form-bars are 
removable, so that any bar may be left set up while another bar is 
in use. The carriage normally tabulates, or moves to the next 
position or stop, at each operation of the motor-bar, returning 
automatically at the end of the line to the beginning of the next 
line, with appropriate paper feeding. An auxiliary motor-bar, 
marked VERTICAL, arrests the tabulation, and causes vertical 
feeding of one, two or three lines, as desired. There is also a hand 
tabulation key, marked TAB. 

The control keys on the left-hand side of the keyboard (see Fig. 
Ill) have the functions described below. They are not active in 
themselves, but must be accompanied by a motor-bar operation, 
after which they are automatically released. 

( 1 ) Register selection keys, marked 1 , 2 , 3 , 4 , 5 and 6, cause the 
register selected to sub-total. Its contents will be printed and 
transferred to other registers, according to the stop that is then 
controlling the hanging-bars; if a non-add stop is in use, the contents 
are merely printed. 

(2) The NON-ADD key prevents addition in any register that is 
being called for by means of its hanging-bar. It does not, however, 
cancel subtraction in registers 1 and 3 , if that is called for; in other 
words, it is not a non-sub tract key. 

( 3 ) The TOTAL key, in conjunction with a register selection key, 
causes the register selected to be totalled instead of sub-totalled; 
transfer to other registers is not affected. 

( 4 ) The SUBTRACT IN 1 key causes register 1 to subtract, 
even if the stop is calling for addition in that register, and is usually 
used for casual subtraction from the keyboard. If this key is used 

e2 



96 


Comeeb —Scientific Applications of 


[No. 2, 


in conjunction with a register selection key, it also causes totalling 
of that register. Systematic subtraction in register 1 would, of 
course, be effected by using a -1 stop; any register selected is then 
merely sub-totalled, unless the TOTAL key is specially depressed. 

( 5 ) The functions of the SUBTRACT IN 3 key are similar to 
those of the SUBTRACT IN 1 key. 

(6) The RELEASE key releases all keys depressed, whether 
control keys or part of the keyboard. 

Any number of figures may be cut off from the right or the left 
of the printing. The printing of any number may be wholly 
suppressed by holding down a small printing lever; if suppression 
in a given carriage position is desired, a small finger is set on the stop 
to depress the printing lever. 

It will be observed that the instructions to the machine what 
number is to be printed and operated with are conveyed by the 
keyboard—either by the 12 columns of setting keys in the case of a 
new number, or by the register selection keys in the case of a number 
already in the machine. The destination of the number, i.e. the 
registers in which it is to be added and subtracted, are indicated 
by the stop that is then over the hanging-bars, but with power to 
supersede the stop—either wholly or partially—by the NON-ADD 
and SUBTRACT keys. Because much of the control is given by the 
stops on the form-bar, the machine lends itself to work in which a 
cycle of operations is repeated indefinitely. The duties of the 
operator consist mainly in entering data and then depressing a 
sequence (the same in every cycle) of register selection keys in 
conjunction with the motor-bar. Under these conditions high speed 
and accurate operating are quickly attained. 

The carriage will take paper iS inches wide, but the maximum 
print width is about 17 inches. It has been found convenient to 
standardise on two sizes of paper—foolscap (8 x 13) and double 
foolscap (13 x 16), used either as a 13- or as a 16-inch width. 

Application to Integration .—The two principal applications of the 
machine are to mechanical integration from finite differences, and to 
its converse—differencing. We shall use the following notation: 


Function 
/- 3 

First 

Second 

Third 

/—2 

Ala 

•A —2 


/-I 

Alii 

a" 

A_! 

fo 

All 


A-i 

fl 

A} 

a? 

A? 

h 

h 

Ail 

A2i 

i 

A« 


Differences 

Fourth Fifth Sixth 


A" 




AS' 



Intemation fiom tiuih Differences. 


1936] 


the National Accounting Machine . 


97 



r-i 


CO O O 1 > 00 C 5 


The registers appear m this anomalous order on the machine. 







98 


Comrie —Scientific A'pfilications of 


[No. 2, 


Suppose we know the function values/_ 3 to/ 2 , and the corresponding 
differences, and require to produce further function values from a 
succession of sixth differences. We apply the following equations 
in turn: 

a; - a:* + a 0 vi 

A‘ T = A* + a; 

A" = Af + Ai T 

k = a; + K 

Ki = Ajj +AJ 

/3 — /a + A 2 | 

To mechanise this, the six registers contain/ 2 and the five backward 
differences that begin at / 2 . AJ 1 is set on the keyboard, and added 
to the register containing Al t to produce A*. This register is then 
added to that containing Aj> v to produce A^, and so on till / 3 has been 
produced in the function register, and the other five registers contain 
the five backward differences that begin at / 3 . The full details 
of the operation are shown in Table I. 

As a simple illustration consider the development of x 6 for integer 
values of x , from its constant sixth difference of 6! or 720 . The 
usual arrangement of the resulting difference table would be 


x 

1 


4 

5 

6 

7 

8 


z 6 A' 

1 

63 

64 

665 

729 

3367 

4096 

11529 

15625 

31031 

46656 

70993 

117649 

144495 

262144 


A" 

602 

2702 

8162 

19502 

39962 

73502 


A"' 

2100 

5460 

11340 

20460 

33540 


A 1V 

3360 

5880 

9120 

13080 


A v 


2520 

3240 

3960 


A VI 


720 

720 


If we know values up to x — 6, the line of backward differences 
shown in bold figures is easily found. Beginning in position 2, 
we enter, through the keyboard, 2520, 5880 . . . 46656 m succession. 
Four cycles of the operations in Table I then produce 


X 

A yI 

A v 

A 1T 

A ,n 

A* 

A' 

/=** 

7 

720 

3240 

9120 

20460 

39962 

70993 

117649 

8 

720 

3960 

13080 

33540 

73502 

144495 

262144 

9 

720 

4680 

17760 

51300 

124802 

269297 

531441 

10 

720 

5400 

23160 

74460 

199262 

468559 

1000000 



1936] 


the National Accounting Machine. 


99 


Each cycle takes about 10 seconds; with interruptions for renewing 
paper, etc., 200-300 values an hour can be produced. Negative 
differences are printed as complements, which at first sight might 
appear inconvenient, but in practice has been found to be more 
advantageous than otherwise. If A V1 changes sign, the stop in 
position 2 is reversed (which can be done in a few seconds), so that 
this difference is always in direct form, i.e. no complementary 
keyboard setting is required. If A V1 oscillates in sign from line to 
line, a record (by position) of the signs used may be obtained by 
having two stops in position 2, namely a +1 and a — 1, and entering 
on the appropriate stop. If A vi consists of two or more components, 
two or more stops (usually ± 1 stops) would be used in position 2. 
Several prints of the function may be obtained by inserting more 
non-add stops after position 8, and sub-totalling register 5 on each 
of these. 

Application to Differencing .—It is well known that a series of 
values of a dependent variable at equal intervals of the independent 
variable can be checked against accidental error by forming 
successive orders of differences until the differences vanish, or rather 
oscillate in sign, and then examining the magnitude of the oscillations. 
Thus, if the function is correct to within half a unit of its last decimal, 
the working limits of oscillations for various vanishing differences 
are approximately 

Vanishing Difference Third Fourth Fifth Sixth 

Limits 3 ; 3 ±6 ±12 ±22 

These will be exceeded only in rare cases where the rounding off of 
several consecutive values is in opposite directions, and approxi¬ 
mately half a unit. 

An error of magnitude E in the function affects n + 1 consecutive 
values of the «th difference, by the amounts E x the binomial 
coefficients of (a — b) n . Thus an error of +1 in a function affects 
six fifth differences by the amounts + 1, — 5 , +10, —10, + 5 , —1. 
By means of this property errors are easily located in position, 
magnitude and sign. The work of forming differences by hand, 
or even with an ordinary calculating machine, is very laborious, 
involving much writing (one of the principal loopholes for error) 
and setting; hence it is not surprising that as many errors are often 
made in the checking process as in the function values themselves. 
Moreover an indifferent computer will often fail to detect errors in 
the leading figures of the function. Differencing is also required 
as the first stage of interpolation or subtabulation. 

The National will difference to the fifth difference a function 
whose values are conveyed to it by the keyboard. A rather unusual 



100 


Comree —Scientific Applications of 


[No. 2, 


property of differences is used, as this leads to practically the same 
cycle of operations as in integration, with other incidental advantages. 

Ai T -Ai T 

- A 5 -(Af + AD 

= AJ - (AJ + Af + AD 

= Ai 4 -(A^ + Ai' + Af + Ai v ) 

— /s "" (1/2 + AJ 4 + AJ + Af + A* 0 V ) 

The application of this can be studied from Table II, which is self- 
explanatory. At the beginning of the line the machine has been 
prepared, and, as soon as f z is conveyed to it through the keyboard, 
the above equation becomes effective. The remaining operations, 
which are really preparation for the following line, print the inter¬ 
mediate differences and repeat the print of the function. If the 
function is negative, it is entered negatively into register 1, i.e. 
by means of a — 1 stop, and is printed in complementary form at 
the end of the line. At a change of sign, a small knob on the 
reversible zcl stop is turned. The appearance of work from the 
machine is shown below. 


X 

/<*> 

A r 

A 1V 

A m 

A" 

A' 

/<*) 

117 

-108022 

999998 

999988 

999796 

1112 

20655 

891978 

118 

86468 

3 

999991 

999787 

899 

21554 

913532 

119 

64237 

0 

999991 

999778 

677 

22231 

935763 

120 

41552 

8 

999999 

999777 

454 

22685 

958448 

121 

- 18643 

999994 

999993 

999770 

224 

22909 

981357 

122 

4 - 4263 

10 

3 

999773 

999997 

22906 

4263 

123 

26940 

999998 

1 

999774 

999771 

22677 

26940 

124 

49166 

3 

4 

999778 

999549 

22226 

49166 

125 

70723 

0 

4 

999782 

999331 

21557 

70723 

126 

91460 

63 

67 

999849 

999180 

20737 

91460 

127 

111015 

999722 

999789 

999638 

998818 

19555 

111015 

128 

129366 

551 

340 

999978 

998796 

18351 

129366 

129 

146288 

999457 

999797 

999775 

998571 

16922 

146288 

130 

161623 

270 

67 

999842 

998413 

15335 

161623 

131 

175233 

999953 

20 

999862 

998275 

13610 

175233 

132 

186995 

999995 

15 

999877 

998152 

11762 

186995 

133 

+196808 

7 

22 

999899 

998051 

9813 

196808 


Reading from right to left, we have, on one horizontal line, a 
function value and the successive backward differences up to the 
fifth that begin at that function value. If any error is made in 
operation (other than a setting error), the second print of the function, 
either on the line in which the error was made, or in the following 
line, will not agree with the first print. This is an excee ding ly 



Table II. 


1936] 


the National Accounting Machine. 


101 



Return _ — The carnage returns to position 1 





102 


Comree— Scientific Applications of 


[No. 2, 


useful check, and has the further advantage of providing a spare 
copy, so that one print can often be used for printer's copy, while 
the other is retained with the differences. The second print can be 
repeated as often as desired by inserting non-add stops after position 
8, and sub-totalling register 5 on each of these. If the function 
consists of several components, e.g. id ± & 1 c, etc., we may use 
several ~ 1 stops in position 2; the print at the end of the line will 
be the combination of the entries. By repeating the process, 
entering the fifth differences already obtained, differences up to the 
tenth may be found. 

There is an obvious error in the example given. In searching 
for this we first seek the two adjacent fifth differences whose sum 
is nearly zero, i.e. those on lines 128 and 129. These are the fifth 
differences containing -f 10 and —10 times the error, so that the 
original error is in line 126, and is about T \j of 551 or 543, i.e. about 
54 or 55. Transposition of o and 6 is immediately suggested, and 
agrees with the direction of the error, i.e. A v on line 126 is too great, 
because the (positive) function on this line is too great. Hence the 
function 'value should be 91406. The advantage, for checking 
purposes, of having fifth differences adjacent to the function values 
is evident. 

Computers using the work of this machine soon become 
accustomed to the “ geography ” of the differences in this form. A 
central second difference, i.e. one “ opposite ” a function value in 
the usual arrangement, is here one line lower; a central fourth 
difference is two lines lower, and so on. 

When it is realised that differences up to the fifth can be formed 
at the rate of 200-300 an hour, some conception of the value of the 
machine in a computing establishment may be formed. No 
computer could write the differences at half this rate, even if he knew 
them! 

Application to Formation of Moments ,—The formation of moments 
by summation when the data are at regular intervals of the 
independent variable is well known in statistical and actuarial 
science, although perhaps not employed on a large scale, because of 
its laboriousness. A simple example, where [/], [/a], [/a, 2 ], [/j 3 J 
and [/a 4 ] are found by the multiplication process, is as follows: 


x 

4 

3 

2 

1 


fx 




68 

272 

1088 

4352 

57 

171 

513 

1539 

48 

96 

192 

384 

31 

31 

31 

31 

204 

570 

1824 

6306 


Sums 91 



1936] the National Accounting Machine . 103 

The summation process is here applicable in its simplest form. 


Summations 


X 

/ 

I 

II 

III 

IV 

V 

4 

17 

17 

17 

17 

17 

17 (xl) 

3 

19 

36 

53 

70 

87 (xl) 

104 (x 11) 

2 

24 

60 

113 

183 (xl) 

270 (x 4) 

374 (x 11) 

1 

31 

91 

204 

387 (xl) 

657 (xl) 

1031 (xl) 

Sums 


91 

204 

570 

1824 

6306 


If the values of x are at unit interval, and if they terminate at x 
(adjustments for other cases are easily made), the last value in the 
first summation column is obviously [/], and that in the next column 
[/a?], [fx 2 ] is obtained from the third summation by adding the 
last two values; [fx z ] from the fourth summation by applying to 
the last three values the multipliers i, 4 and 1; [fx A ] from the last 
four values of the fifth summation and the multipliers x, 11, ix and 
1. This property, a proof of which must be sought elsewhere, holds 
for any number of values of x. The summation process is, of course, 
simply the mechanical integration process. Starting with all regis¬ 
ters empty, we enter the successive values of / as fifth differences, 
using five registers only. 

For finding moments up to the second only, as in forming means 
and standard deviations, four registers only are used—three for the 
summations and the fourth to add the last two values in the third 
summation column without further setting. If there are positive 
and negative values of x (as well as x = 0), all six registers may be 
used to give the three final results desired, namely [/], [fx] and 
[fx 2 ]. As this affords a good illustration of the way in which a 
set-up is worked out, it is given in full, together with its application 
to the frequency distribution on page lxxv of Pearson’s Tables for 
Statisticians and Biometndans, Volume I. 


Position 

1 

2 

3 

4 

5 

Stop 

Non-add 

1+4 

2±3 

6 

5 and return 

Operation 

Set x 

Set / 

Sub-total 1 

Sub-total 2 

Normally retun 
(S.T.6 or T.6) 

Print 

X 

/ 

First sum¬ 
mation of / 

Second sum¬ 
mation of / 

Contributions 
to [fa*] 


The stop in position 5 normally causes the carriage to return 
after leaving position 4. By turning the return lever through 45 ° 
it becomes ineffective, and the stop comes into operation as a normal 
register 5 stop. Begin with positive a’s, the reversible stop in position 
3 being set to 2+ 3 . Operate through positions 1 to 4, down to and 



104 


Comrie —Scientific Applications of 


[No. 2, 


including the line x = + 3 . On the line x = + 2, sub-total register 6 
in position 5. On the line x = + 1 , after setting/, total registers 1, 
2 and 6 in succession. Change the reversible stop to 2 — 3 , and do 
the negative aj’s, down to and including the line x = — 3 . On the 
line x = — 2, sub-total register 6 in position 5. On the line x = — 1, 
it is imm aterial whether register 1 is sub-totalled or totalled, but 
registers 2 and 6 are totalled. On the next line change the reversible 
stop to 2 + 3, skip position 1, set / 0 , and enter in position 2 with the 
VERTICAL bar. Total register 4 to give [/]. Then total register 3, 
with the NON-ADD key depressed, to give [fx]. Then total register 
1, with the NON-ADD key depressed and the printing suppressed, 
to clear that register. Finally total register 5 to give [/& 2 ]. All 
registers are now clear. 

Application to Svbtabulation. —Multi-register machines lend them¬ 
selves to subtabulation, i.e. to systematic interpolation to fixed 
intervals, especially fifths and tenths. The use of two-register 
machines for the simple case of interpolation to tenths when third 
differences are negligible has been described previously.* The use 
of the National for subtabulation from fourth differences was 
-described by the author in his Newmarch Lectures early in 1933 , 
but is here published for the first time. The theory of similar 
methods has also been developed by E. C. Bower ,'\ but his methods 
appear to exceed both the keyboard and the register capacity of all 
machines available; also they ignore the advantages of the throw-back. 

Methods of subtabulation in which leading differences of the 
interpolates are computed for each original interval, and a constant 
higher-order difference applied till the next pivotal value is reached, 
have been practised at least since the days when Prony worked on 
the still unpublished Tables du Cadastre at the end of the eighteenth 
century. Their defects are the large number of extra decimals 
required, the necessity for calculating leading differences at frequent 
intervals, failure to reproduce the pivotal values exactly, and the 
•ease with which errors in the building-up stages can escape detection. 
Concerning this latter point, Hayashi’s tables are full of examples 
of the treacherousness of this method, while Davis in his Mathe¬ 
matical Tables , Volume I, has an error of one unit in the fourth 
decimal of nine consecutive values of a 10-decimal function, directly 

* L. J. Comrie, “The Nautical Almanac Office Burroughs Machine,” 
Monthly Notices of the Royal Astronomical Society , 92, 537 (1932). This pro- 
•cess can also be used, without printing, on the Brunsviga Dupla, and on the 
Brunsviga IVa and 20; see Monthly Notices , 88, 447 (1928) and 91, 819 (1931). 
The modem six-register Burroughs machines cannot transfer directly from one 
register to another; transfer can be effected only through an intermediate 
mechanism, and only to one register at a time; in some models many idle 
“ spacing ” strokes are required. 

t Lick Observatory Bulletins 455 and 467. 



1936 ] 

the National Accounting Machine . 105 

1 

2 

3 

4 

5 

+ 19 

1 

1 

1 


18 

1 

2 

3 


17 


2 

5 


16 

6 

8 

13 


15 

3 

11 

24 


14 

8 

19 

43 


13 

13 

32 

75 


12 

14 

46 

121 


11 

24 

70 

191 


10 

20 

90 

281 


9 

30 

120 

401 


8 

35 

155 

556 


7 

43 

198 

754 


6 

56 

254 

1008 


5 

66 

320 

1328 


4 

66 

386 

1714 


3 

83 

469 

2183 


2 

79 

548 

2731 

11432 

+ 1 

102 

650 

3381 

14813 

-20 

1 

1 

1 


19 

1 

2 

3 


18 

2 

4 

7 


17 


4 

11 


16 

3 

7 

18 


15 

3 

10 

28 


14 

5 

15 

43 


13 

7 

22 

65 


12 

12 

34 

99 


11 

13 

47 

146 


10 

17 

64 

210 


9 

28 

92 

302 


8 

24 

116 

418 


7 

43 

159 

577 


6 

53 

212 

789 


5 

57 

269 

1058 


4 

55 

324 

1382 


3 

68 

392 

1774 


2 

83 

475 

2249 

9180 

- 1 

85 

560 

2809 

11989 


96 





1306 

572 


47414 


-r n 

-[/*] 


“[/**] 

due to the use 

> of this method. Sang,* in a 

well-deserved con- 

demnation of Prony’s organization, 

says “ The method followed in 

the calculation of the Cadastre table of logarithms was an egregious 

blunder. The result was in ; 

accordance with the method.” 

These defects tend to disappear when machines with large adding 

capacity and many registers 

are available, particularly if they are 

* Proceedings of the Royal Society of Edinburgh, 8, 421 (1874). 



106 


Combie — Scientific Applications of 


[No. 2 


able to print tbeir results. A great improvement is effected, however, 
by computing, not the leading differences in each interval, but the 
missing or “bridging” higher-order differences between successive 
sets of constant values. In the methods that have been developed 
for use with the National machine, the following conditions are 
always satisfied: 

( 1 ) The process of forming interpolates by integration from a 

certain higher-order difference is continuous after the 

original leading differences have been found. 

(2) The pivotal values are reproduced exactly. 

( 3 ) The number of extra or fictitious decimals is as small as 

possible. 

( 4 ) The method of computing the required higher-order bridging 

differences is as simple as possible. 

The problem of interpolating to tenths when fourth differences 
are less than iooo will illustrate the general principles of the methods 
of sub tabulation now available for use with multi-register machines. 
If we take Everett’s formula in the form 

fn= (l- »)/• + */i + KMl + W[M{ 

where 

M" = A* - 0*184A iv 


and write the complete expressions for successive interpolates, we 
have 

A = 1-0 /. 

Zo-I = 0'9/o + 0-lfi - 0-0285 Ml - 0-0165 M'{ 

/«* = 0-8/„ + 0-2 f x - 0-0480 Ml - 0-0320 M? 

Ai = 0-3/„ + 0-7/, - 0-0455 Ml - 0-0595 ^ 

Jo 8 = 0 - 2 /„ + 0 - 8 / - 0-0320 4 f 0 ' - 0 ' 04801 f" 
fo-0 = 0-1 /„ + 0 - 9 / x - 0-0165 Ml - 0-0285 M" 

A* = i-o/i 

Ai = 0-9/i + 0-l/ a - 0-0235 M" - 0-0165 
Ao = 0-8/t + 0-2/ 2 - 0-0480 Ml - 0-0320 M'' 


The fourth differences of these are zero, except at 0"9, 1"0 and l’l. 
Calling these values S^ 9 , S^T 0 and 8£i, we find, by differencing, 

ft = = - 0-0165 (A/ - 0-184A?) + 0‘1 x 0-184Ai T 

ft = + 0-0340 (Ai T - 0-184 A?) - 0-2 x 0-184A 1 / 



1936] 


the National Accounting Machine . 


107 


where the quantity 0*184 A V1 is, if 0 T 84 A 1V is rounded oft, the second 
difference of the rounded-off values of 0 * 184 A 1V . 

In order to reduce these bridging differences to three decimals 
(the umt being the last decimal of the quantities that are being 
sub tabulated), M" must be even always. To secure this we make the 
rounded-off quantity 0 * 184 A 1T even when A" is even, and vice versa. 
Calling this quantity D iv , and its second difference D v \ and writing 
the bridging differences in units of the third decimal, 

= STi = - 16*5 (A 1 / - UP) + 100 Dp « A 
§i v 0 - 34 (A 1 ; - UP) - 200 D 1 / = B 

To facilitate the calculation of these, a special table has been 
constructed with arguments A 1V - D vl (always even) and D 1V , the 
latter being taken as 0.184 times the former. Specimens of the 
table, showing portions required in an illustration to follow, are 
given below. If for any particular value of A iv — D vi the tabular 
value of Z) lv is not that required, the entries of the table are corrected 
by adding 100 and 200 for each unit that D iv is in excess of the tabular 
value, and vice versa. Thus we .should have 


A lv -D vi 

D 1Y 

A 

B 

234 

41 

239 

244 

234 

42 

339 

444 

234 

43 

439 

644 = tabular value 

234 

44 

539 

844 

234 

45 

639 

1044 


The rule of signs, for the actual entries in the table, is that A has the 
sign of A iv , and B the opposite sign. 

The leading differences, in units of the third extra decimal, are 
easily shown to be 


Leading 8' = 100A( - 28 -BJfJ - 16 -BJKJ 

= - 100/ 0 + 100 /! - 28-5 Ml - 16 -BJff 
„ V = + M{ 


>) 

where 


8 W = -M! + Ml = Af - D\ 
2 kP = A" - D iv 


and is always even. 

The following example shows the mode of preparation. The 
column D lr is first entered, using the first column of the auxiliary 
table as argument, and the second as respondent, and choosing the 



108 


Comrie —Scientific Applications of 


[No. 2, 


nearest odd or even value according as A" is odd or even. Thus for 
A Iy = 134 and A" odd, we get Z) 1V = 25 immediately. For A iy = 145 
and A* odd, we choose D iy = 27. For A iy = 157 and A” even, we 
choose D ,y = 28, which is evidently nearer than D iv = 30. The 
first values of A and B are taken directly from the table with argu¬ 
ments 145 — {— 1) and 27, i.e. 146 and 27. For the next pair the 
arguments are 156 and 28, but as the table gives 156 and 29 we 
must subtract 100 and 200 from the tabular values. 

To calculate the leading differences, we have 

Ml - 56375 - 25 = 56350 M{ = 58833 - 27 = 58806 
whence 


Leading S' = 200 132 326 
„ S* = 565 956 

„ S'* = 2 456 

The division of these into groups of three corresponds to the colour 
scheme of the keyboard. If any of the leading differences are 
negative, they will appear in complementary form when computed 
by the formula given, and are conveyed to the machine in that form. 
The appearance of the machine work for the first 30 values is shown. 
It will be noted that 5 has been added to the first fictitious decimal 
of the interpolates, so that they are thus automatically rounded off 
to the same number of decimals as in the original values/ Each 
pivotal value is checked as it is formed; if there is an error, the 
value of that appears eight times since the last pivotal value is 
compared with A m — If ; any lack of agreement indicates that A 
and B, as entered, are not correct. The two otherwise idle registers 
may be used to develop the argument. The spee’d of this process is 
300-400 values an hour. It will be recalled that any number of 
copies of the last column may be made. 


X 

/(*> 

L n 

A 1V 

2>iv 


jyn 

A 

B 


+ 

+ 

+ 

+ 

+ 


+ 

- 

0 

41 014 199 

56 375 

134 

25 

2 




10 

43 041 285 

58 833 

145 

27 

i 

-1 

291 

436 

20 

45 127 204 

61 436 

157 

28 

1 

0 

+ 1 

226 

296 

30 

47 274 559 

64 196 

167 

30 

2 

Q 

+ 1 

261 

356 

40 

49 486 110 

67 123 

180 

33 

*> 

Q 

0 

330 

480 

50 

51 764 784 

70 230 

195 

36 

O 

A 

+ 1 

399 

604 

60 

54 113 688 

73 532 

213 

40 

4 

-3 

436 

656 

70 

56 536 124 

77 047 

224 

41 

1 

+ 4 

470 

720 

80 

59 035 607 

80 786 

247 

46 

5 

-1 

508 

768 

90 

61 615 876 

84 772 

268 

50 

4 

0 

578 

888 

ZOO 

64 280 917 

89 026 

288 

54 

4 






1936] 


the National Accounting Machine . 


1<» 


A 1 


D 1Y A 

B 


134 

25 289 

444 


144 

26 221 

304 


146 

27 291 

436 


154 

28 259 

364 


156 

29 326 

496 


158 

29 293 

428 


160 

29 260 

360 


162 

30 327 

492 


164 

30 294 

424 


166 

31 361 

556 


168 

31 328 

488 


170 

31 295 

420 


180 

33 330 

480 

X 


h m 

8* 

0 




1 




2 



565 956 

3 


2 456 

568 412 

4 


2 456 

570 868 

5 


2 456 

573 324 

6 


2 456 

575 780 

7 


2 456 

578 236 

8 


2 456 

580 692 

9 


2 456 

583 148 

10 


2 456 

585 604 

11 

291 

2 747 

588 351 

12 

436 

2 311 

590 662 

13 

291 

2 602 

593 264 

14 


2 602 

595 866 

15 


2 602 

598 468 

16 


2 602 

601 070 

17 


2 602 

603 672 

18 


2 602 

606 274 

19 


2 602 

608 876 

20 


2 602 

611 478 

21 

226 

2 828 

614 306 

22 

296 

2 532 

616 838 

23 

226 

2 758 

619 596 

24 


2 758 

622 354 

25 


2 758 

625 112 

26 


2 758 

627 870 

27 


2 758 

630 628 

28 


2 758 

633 386 

29 


2 758 

636 144 

30 


2 758 

638 902 


A iv -D vl 

D 1V 

A 

B 

194 

36 

399 

604 

196 

36 

366 

536 

212 

39 

402 

592 

214 

39 

369 

524 

216 

40 

436 

656 

218 

40 

403 

588 

220 

40 

370 

520 

222 

41 

437 

652 

224 

41 

404 

584 

246 

45 

441 

636 

248 

46 

508 

768 

268 

49 

478 

688 

270 

50 

545 

820 

S' 


/(*) 



41 014 199 500 
200 132 326 41 214 331 826 

200 698 282 41 415 030 108 

201 266 694 41 616 296 802 

201 837 562 41 818 134 364 

202 410 886 42 020 545 250 

202 986 666 42 223 531 916 

203 564 902 42 427 096 818 

204 145 594 42 631 242 412 

204 728 742 42 835 971 154 

205 314 346 43 041 285 500 

205 902 697 43 247 188 197 

206 493 359 43 453 681 556 

207 086 623 43 660 768 179 

207 682 489 43 868 450 668 

208 280 957 44 076 731 625 

208 882 027 44 285 613 652 

209 485 699 44 495 099 351 

210 091 973 44 705 191 324 

210 700 849 44 915 892 173 

211 312 327 45 127 204 500 

211 926 633 45 339 131 133 

212 543 471 45 551 674 604 

213 163 067 45 764 837 671 

213 785 421 45 978 623 092 

214 410 533 46 193 033 625 

215 038 403 46 408 072 028 

215 669 031 46 623 741 059 

216 302 417 46 840 043 476 

216 938 561 47 056 982 037 

217 577 463 47 274 559 500 



110 


Comeie— Scientific Applications of 


[No. 2, 


Every change of circumstance, i.e. of number of interpolates in 
each interval or difference that can be neglected, gives opportunity 
for skill in devising the best working form of the general principles. 
Two other cases will be briefly summarised, each of interpolation to 
tenths. 

If fifth differences are less than 500 , we use four extra decimals, 
and integrate from sixth differences, five of which in each interval 
are zero. To prepare the remaining five, let 

= \ (A J 0 V + A'f) to the nearest multiple of 8 

a == + sil) - Ai v 

b = si; - s 1 ; 

A is always small, being of the order ^A^. B is necessarily a 
multiple of 8 . 

Then, in units of the fourth extra decimal, 

&J ! 8 = 1654 - 33 x 
8R--6704 + 70x15 

§r 0 = 10104 

$l\ = -670 A - 70 x ££ 
sr 3 = 1654 + 33 x \B 

for which simple tables with arguments A and B have been prepared. 
The leading differences, in the same units, are 

Leading 5' = 1000 A[ - 285 A" - 165 AJ + 78*375 Z\ y 
„ $* = 90AJ+ 10 AJ- 12*75 S 1 ; 

„ a* « - 10 a; + 10 a" - 3*5 s; v 

„ 8* T = % 

„ s T = 0 

If sixth differences are less than 10 , 000 , i.e. within the throw¬ 
back li mit, the normal process would require 8 extra decimals, which 
could be reduced to 6 by making M iv (= A iy — 0*207A T + 0*045A viiI ) 
a multiple of 4 . This would leave insufficient capacity on the key¬ 
board,* so a process requiring 9 extra decimals, but no preparation 
of the bridging differences, has been adopted. This yields the last 
2 decimals of the interpolates, which are rounded off in the usual 
way, i.e. by adding 5 to the first rejected figure. By differencing 
these 2 -figure values we obtain the last two figures of the fifth 
difference, i.e. the whole of that difference. By integration from 
these differences (or, in general, from the fourth) we obtain the 
complete interpolates. 

* 12-column machines have only recently become available; the three 
machines now in use have 11 columns only. 



1936] 

If* 


the National Accounting Machine . 


Ill 


= 125 A 1 v - 26 A V1 


and if all quantities are in units of the ninth extra decimal, 


Leading S' = 100,000,000A; - 28,500,000A" - 16,500,000Ai' 

+ 36,366 + 26,331 

„ r = 9 , 000,000 a; + i,ooo,oooaj - 8 , 220 m i ; - i,mMT 

„ - 1,000,000A- 900A?o V - l,9003li v 

„ = 640il4 y + 160 M? 

„ S v - -80MJ V + 80Mi y 


Sjg = = -140,250A^ 1 - 684,684A™ 1 

% = $l\ = +521,500Ar - 2,790,216Ar h 
S yi 0 = — 752,500AJ 1 - 4,213,144 a* 11 


Tables of the first 10,000 multiples of the three factors of A V1 and of 
the first 1000 multiples of the three factors of A vnI have been made. 
This particular process has been extensively used in the interpolation 
of 10-figure Bessel functions at interval 0 * 1 , where the original A* 
is usually a 4-figure number. A method of interpolating to tenths 
when the eighth difference is less than 100,000 has also been 
successfully applied. 

Application to Constant Multipliers .—If a series of values at 
regular intervals of an independent variable were multiplied by a 
constant c, without the rejection of any decimals from the products, 
the differences of the products would be c times the differences of 
the original values. If, therefore, the original values be differenced 
till small or oscillating differences are reached, the desired products 
may be formed by integration from c times these differences. If the 
keyboard capacity is insufficient, it will generally be possible to get 
at least the last two decimals required, difference these, and then 
produce the complete values by a further integration. 

If the original third differences oscillate between ±3, three 
registers only are required for the integration, and the other three 
can contain c, 2 c and 3 c. Thus no setting 
is required, as the new third differences are 
transferred from these registers. If the 
original fourth differences oscillate between 
in 6 , two registers only are available for 
multiples of c. Hence we put 2 c and 3 c in 
registers 5 and 6 respectively, and enter the 
new fourth difference in two components, 
operating from the table shown alongside, with a similar table for 
negative fourth differences. 


A 1V 

First 

Stop 

Second 

Stop 

-rl 

-5 

+ 6 

T 2 

+ 5 

Skip 

+ 3 

+ 6 

Skip 

+ 4 

t5 

+ 5 

+ 5 

+ 5 

+ 6 

+ 6 

+ 6 

+ 6 


* The coefficient of the throw-back from A" 71 to A lv has been taken as 
— 0*208 instead of -0*207. 



112 Comrie :—Scientific Applications of [No. 2 , 

This process has been applied to a run of 360 6-figuxe values that 
had to be multiplied by about 50 different 5-figure constants. 

Application to Solution of Differential Equations .—The use of the 
machine for this purpose is still in its infancy. Successful experi¬ 
ments have been made with a first-order equation of the form 

!-/<«> 

using the Adams process. Denoting the interval by u\ and ky 
q 9 we find the first values by a series expansion, and then have 


Argument 

Enaction 

3 


Differences 


*0 

Vo 

Qo 





*1 

£2 

Vi 

Vz 

u 


K 

K 

Af 4 

A a 


y s 



a; 

Afj 


*4 

y* 

U 

A si 





The fundamental equation is 

= Sk “b £4 + -f -f £A^ + 4- ... 

= 2/4 + Sfe + [GA] 

The machine is set up to difference q to the fourth difference, 
thus leaving one register free for the formation of y. q and [CA] 
are formed by an auxiliary machine—say a Brunsviga. 


Position 

Stop 

Operation 

Print 

1 

N.A. 

Set argument 

Argument 

2 

5 

Set [C A] 

10 A] 

3 

N.A. 

Sub-total 5 

y 

4 

±1 

Set q 

? 

5 

3 

Total 1 

A Ir 

6 

-1 + 4 

Sub-total 3 

A m 

7 

-1 + 2 

Sub-total 4 

A" 

8 

-1 + 6 

Sub-total 2 

A' 

9 

—1 + 5 

Sub-total 6 

q 


There is no writing by hand at any stage. In a test case, working 
to 10 decimals, the error after 50 stages amounted to only 4 units 
of the tenth decimal. The application to higher-order, simultaneous 
and partial differential equations is a field still to be explored. 

Coyidmion .—It is believed that an extremely powerful tool has 
been added to the computer’s equipment, and that large programmes, 
particularly of table-making, that could not have been faced ten 



1936] 


the National Accounting Machine. 


113 


years ago because of tbeir sheer laboriousness, can now be under¬ 
taken without hesitation. It is also contended that the discovery 
of the usefulness of an available commercial machine at least ranks 
with the design of special machines, and it is emphatically urged 
that the possibilities of existing mass-produced machines should be 
exhaustively explored before design is even contemplated. 


The Chairman announced that Dr. Comrie would be pleased to 
deal with any questions that might be put to him on the subject of 
his lecture. 

Dr. Calvert : May I ask a very elementary question about the 
early part of your talk, where you said you checked the first difference 
by adding it to the function and making sure that the sum was the 
next tabular values of the function % Does it ever happen that you 
set the function wrongly and the difference wrongly, and that these 
two errors together cancel out ? 

Dr. Comrie : All I can say is that so far such a thing has not 
happened. Perhaps I was being rather too boastful in saying that 
we do not check the first function as we set it on the machine. One 
does it almost automatically then, although it is not essential in this 
particular process. 

A Member of the Audience : Are product moments also easy 
to calculate with the National machine 1 

Dr. Comrie : Hxy cannot be calculated with this machine. If 1 
had had more time, I could have introduced you to a machine that 
offers considerable advantages for that type of work, namely the 
Hollerith multiplying punch. Numbers are punched on cards; if 
there are, say, three numbers a , b and c punched, one can easily 
obtain the sums of all the possible products, i.e. Ea 2 , S&J, Sac, £& 2 , 
'Zbc and Sc 2 . 

Dr. Neyman : I should like to compliment Dr. Comrie, and to 
enquire as to the efficiency of the Hollerith machine and the cal¬ 
culations that he showed. 

Dr. Comrie : I would like to point out the limitations of the 
ordinary Hollerith adding machine. The process mentioned just 
now is satisfactory only when the number of cards in each group is 
sufficient; it depends upon sorting the cards into order on each digit 
of the multiplier. The second limitation is that the adding machine 
can be used only to form products of two numbers on the card; 
if you want to find triple or quadruple products, the multiplying 
punch is necessary. Anyone considering large fields of work should 
know both machines. I admit that m many cases the adding 
machine is more economical, but I contend that there are cases where 
the multiplying punch has the advantage. 

Mr. Yates : How would the multiplying punch do S/ac 3 % 

Dr. Comrie : If you have / and x on the card, you generally want 
the intermediate products as well. You multiply / by x and make the 
machine punch fx on the card, at the same time giving S/cc. You then 



114 Scientific Applications of National Accountancy Machine. [No. 2, 

multiply tlie punched values of fx by x } making the machine punch 
fx 2 on the card and give S/x 2 ; and so on. 

Mb. Bart le tt : Have you worked out any method for the 
solution of determinants by mechanical calculation 1 

Db. Combie : No, I have not come up against that in my own 
practice or in anything put to me, so that I have not tried. 

Db. Neyman : This is a very important problem and one that 
interests me very much. 

Db. Combie : If anyone brings me a sample of what he has to do, 
and if he wants hundreds or thousands of the same calculation, I 
shall be happy to consider the problem. 

A Member of the Audience : Does not Mr. Mallock’s Cambridge 
machine do determinants ? 

Db. Combie : Yes, provided the accuracy required is not more 
than one in a thousand. 

The Chairman : It remains for me to move on your behalf a 
very hearty vote of thanks to Dr. Comrie for his interesting paper. 
He alluded to mathematicians, statisticians and all practical men. 
I am not sure that statisticians are all practical men; and if the 
audience is to be dealt with in a tri-partite classification, I must 
claim inclusion in the last-mentioned capacity, and not as having 
any of the mathematical skill which most of you here present possess. 
I was fascinated by Dr. Comrie’s performance; and while but dimly 
appreciating the value it represented to yourselves, I was amazed 
at the conjurer’s art in producing mathematical rabbits with extra¬ 
ordinary skill from every pocket he possessed. 

It has always been the pleasure of craftsmen to swap experience 
about their tools; and mechanical means of calculation and tabulation 
are a comparatively recent form of tool of great interest to many of 
us, possessing enormous potentialities for those who have particular 
jobs of work to do and mil apply themselves to getting the best out 
of such mechanical facilities. Dr. Comrie has adapted their use to 
many problems with great skill and ingenuity, and has arrived at 
methods which must save him and his Department a tremendous 
amount of hard labour. 

My problems and those of others present may not be precisely 
his; but it must be valuable to learn the solution of other people’s 
problems and to see how far they suggest a solution of our own. 
While appreciating Dr. Comrie’s successful efforts in his own province, 
we shall all go away and study what he has told us, and see whether 
his methods may not offer to us possibilities of improvements in our 
own methods for which we shall be indebted to him. We will all 
agree that we owe him a very hearty vote of thanks, which it is my 
pleasant duty to propose. 



1936] 


115 


Co-operation in Large-Scale Experiments. 

[A Discussion, opened by Mr. W. S. Gosset, at the meeting of the Industrial 

and Agricultural Research Section of the Royal Statistical Society, March 

26th, 1936. Sir Daniel Hall, K.C.B., F.R.S., in the Chair.] 

At the outset I must confess that the title is to some extent 
misleading: co-operation is, I am quite sure, advantageous in all 
large-scale experiments whether industrial or agricultural, but it 
happens that, though no farmer, I have only had first-hand experi¬ 
ence of co-operation in agriculture and my paper must, therefore, 
deal with that. On the other hand, there are several Fellows present 
who will doubtless be able to draw analogies from agriculture to 
industry as the general principles of experimentation are common to 
both. . 

Forty years ago agricultural experiments were mainly carried 
out in fairly large plots, generally without replication, and in 
consequence the soil differences between two plots which were to be 
compared were often so large as to obscure the issue. 

Then about thirty years ago, several different investigators 
harvested apparently uniform fields by small plots, and it at once 
became obvious that the variation in fertility from point to point 
in a field is so distributed that to obtain the best experimental 
results it is necessary to work with a number of small plots. These 
should be arranged so that comparable plots lie close together, and 
it appeared further that this replication of plots enabled us to 
make an estimate of the error of our results in a single experiment; 
before this it had only been possible to estimate the error of a series 
of experiments carried out at a number of stations or in a number 
of years. 

Finally, about fifteen years ago, Professor Fisher introduced the 
principle of randomizing the position of the plots in the various 
systems of randomized blocks and Latin squares with which many 
of you are familiar. This enabled us to obtain a certainly valid 
estimate of the variability of our results, though usually at the 
expense of increasing that variability when compared with balanced 
arrangements. 

Nevertheless, it must not be supposed that valuable results 
could not be obtained by the primitive methods of forty years ago; 
for example, in the 1880 5 s and 1890’s the Danes, working with 
comparatively large plots, with few replications, but at several 
co-operating stations and in a number of successive seasons, were 
able to establish that Prentice was the most suitable barley to grow 
in Denmark. 



116 


Gosset —Discussion 


[No. 2, 

On the other hand, Mr. Yates has pointed out that it is not 
uncommon, when using the most modern methods in manurial 
experiments, to obtain a significant result on one occasion, but, on 
repeating the experiment in another year or in another field, to get 
an equally significant result in the opposite direction. 

Nor is the reason of this far to seek; among the many causes 
which influence the result of an experiment, we can only control by 
the arrangement of our plots those connected with the variation in 
fertility of the experimental area; apart from these we have the wide 
differences in soil and climate over the districts in which we wish 
to apply the conclusions which we draw from our experiments. 

Hence the old work, if repeated on a representative scale and 
sufficiently often, was able to give results which were applicable 
over a wide area, while the very accuracy of Mr. Yates’s methods 
enables him to reach significance for results of merely local value. 

Nevertheless, it would be a mistake to reduce the accuracy by 
insufficient replication, for only by repeating such work at different 
times and places can the causes of such apparent anomalies be traced, 
and for that the more we can eliminate mere soil errors the better. 

But such repetitions can only be carried out co-operatively, and 
I propose to give some instances of such co-operation, beginning with 
the simplest technique. 

Just before the beginning of this century the Irish Agricultural 
Organization Society, which later became the Department of 
Agriculture, began a research into the most suitable variety of 
barley to grow in Ireland, and this research has been continued to 
the present day. During this time three varieties of barley have 
been introduced into Ireland, after adequate evidence had been 
obtained that each was better than the barley which it succeeded, 
and the methods of seed distribution are such that after a very few 
years the new barley has replaced the old in practically all the barley¬ 
growing districts in Ireland. 

It is interesting to note that the first of the three barleys to be 
introduced was found to be identical with that which the Danes had 
proved to be most suitable for Denmark; the other two were obtained 
from it by cross-fertilization by Dr. Hunter. 

The resulting gain in yield has been remarkable, and though it 
would be easy to attach too much importance to evidence supplied 
by the official estimates, they tally fairly well with the claim which 
has been made, on the basis of the experimental plots, that there 
has been a gain of from twenty to twenty-five per cent. 

During the last ten years the official yield has dropped only 
below 5 qrs. once, while only twice in the previous sixty years did it 
rise above that figure. 



1936] on Co-operation in Large-Scale ExperirnCnts. 


117 


Table I. 

Yield of Barley in Ireland in Quarters per Acre . 


Before Experimenting. 

After Experimenting. 

1866-1870 

... 


4-0 

1901-1905 . 

4-5 

1871-1875 



4-1 

1906-1910 . 

4*7 

1876-1880 



3-9 

1911-1915 . 

4-8 

1881-1885 



3-9 

1916-1920 . 

4-1 

1886-1890 



3-9 

1921-1925 . 

4-1 

1891-1895 



4*3 

1926-1930 . 

5-3 

1896-1900 



4-3 

1931-1935 . 

5-1 


The low yields between 1916 and 1925 were partly due to 
unfavourable weather, but also to the extension of arable land 
during the war, with consequent inclusion of less suitable land, 
and to the subsequent decline in farming technique owing to wages 
being high compared with prices. 

The experiments are carried out at about ten centres where three 
varieties are tested against the standard variety in one-acre plots. 
This somewhat primitive arrangement has been carried on up to the 
present day in order to provide plenty of barley for quality tests. 

In any case, after some years the weather and the barley-growing 
land of the country were sampled in a way which would be impossible 
at a single station. The number of farms should, of course, be 
larger, and doubtless it would be but for the fact that only one 
official is available for supervision, and ten farms at distances of. 
in some cases, over ioo miles is as much as he can manage even when 
the experiment is of this very simple type. 

The error of a comparison between two one-acre plots is large, 
and quite a number of seasons pass before enough repetitions are 
available to reduce the error to a figure which will show that a new 
variety really yields better than the standard. As, however, it is 
as necessary to sample weather as districts, this is of no great 
disadvantage. 

The order of this error is of interest, and I have examined two 
series to determine it,* the first was carried out between 1901 and 
1906, when 51 comparisons between Archer and Goldthorpe gave 
an average advantage to Archer of 7*7 per cent, with a standard 
error of a single comparison of 15-5 per cent. This tallies well 
enough with the traditional 10 per cent, for the error of a com¬ 
parison of a pair of plots at one station, having regard to the further 
real variation due to the differential response of the varieties to 
soil, climate and farming technique. 

The second series was carried out between 1925 and 1935, when 
two selections of the Spratt-Archer cross were compared: they 














Gosset —Discussion 


118 


[No. % 


differed by 0-27 per cent, in 103 trials with a standard error of 9*3 
per cent. 

These two estimates of the error of a comparison, 15*5 per cent, 
and 9*3 per cent., differ significantly, and it is noteworthy that the 
smaller figure was found with barleys which might be expected to 
react in much the same way to differences in soil and weather. 

A second set of experiments has been carried out by the National 
Institute of Agricultural Botany, and I instance it to give an idea 
of the advantage of using a method which reduces the error at each 
station—namely, Beaven's half-drill strip. 

It has been said that from an experiment conducted by this 
method no valid conclusion can be drawn, but even if this were so, t 
it would not affect a series of such experiments. Each is independent 
of all the others, and it is not necessary to randomize a series which is 
already random, for, as Lincoln said, “ you can’t unscramble an egg.” 
Hence, since the tendency of deliberate randomizing is to increase 
the error, a balanced arrangement like the half-drill strip is best 
if otherwise convenient. 

From this work I have taken two series, one of 22 comparisons 
between Spratt-Archer and Plumage-Archer barleys carried out 
from 1925 to 1928 when the former yielded 6*i per cent, more and 
the standard error of a comparison was 8*i per cent. 

There was, however, one experiment in which th^ method wa^ 
not followed in several particulars, and if that be omitted th* 
standard error falls to 5-6 per cent. | 

The second series of N.I.A.B. experiments was a comparho* 
between Spratt-Archer and a selection from Plumage-Archer whk/ 
was carried out at six stations and for three years. It is thui 
possible to analyse the variance, and though the numbers are to/ 
small to give a significant difference in variance, there is an indicatio 
that the greater part was connected with the stations. The averag* 
superiority in yield of Spratt-Archer was 8*2 per cent, and the S.D. 
of a comparison was 8-4 per cent.; this is significant for 18 com¬ 
parisons, so that the main object of the experiment was attained 
provided that the stations could be assumed to be a representative! 
sample. 

The analysis of variance is as follows:— 


Degrees of Freedom. j 

Sum of 
Squares. 

Mean 

Squares. 

Seasons 

2 I 

22*25 

11*13 

Stations 

5 

815*34 

163*07 

Remainder... 

10 

352*26 

35*23 

Total 

17 : 

1189*85 

69*99 



1936] on Co-operation in Large-Scale Experiments. 


119 


The remainder, of course, includes not only the error due to 
soil differences, but also those due to the local differences in climate 
within each season and to the difference between the fields used at 
each station. 

I have drawn attention to this small series because it indicates 
I'-he possibility, had there been sufficient stations, of connecting 
1 he peculiarities of the soil and weather at the stations with the 
elative yields of the varieties. Thus there was an indication that 
4 pratt-Archer was less superior to Plumage-Archer when the yields 
/ere high, but it was by no means significant. 

Assuming, then, that the error of the one-acre plot experiment 
i& of the order 12 per cent, and that of the half-drilled strip 8 per 
cent., the advantage of the latter is not so much that fewer experi¬ 
ments would be needed to evaluate a given difference in yield, for 
in any case it is necessary to spread one’s net widely both in time 
and space; nor is the smaller area occupied a clear gain, for it is 
offset by the necessity for closer supervision; but it does make it 
possible to contract the limits of significance so that more series 
of experiments give definite answers to the questions asked. 

I have instanced the half-drill strip, but obviously any method 
of reducing the error is of advantage, whether it is by replication 
(including, for instance, multiple Latin squares), reduction of the 
jize of plot, ct regular balanced arrangement. 

The instances given above have been fairly simple, inasmuch as 
%.& differential response of barleys to variations of soil and climate 
^ $nall; but even in these cases it would have been of advantage 
;> have spread the net wider : the next experiment to which I am 
a *>ing to re$er is of a more complicated nature, and is concerned with 
a *e response of sugar-beet to artificial manures. 

This has been described in the Rothamsted Report for 1934, 
fl nd though I do not propose to try to add to the full analysis given 
therein, a short account of it may be instructive. 

The experiment was carried out in two seasons, at 13 stations 
in 1933 and 15 in 1934; all combinations of three manures at three 
~ates per acre were tried, and measurements of the weights of roots 
and tops, and of percentage of sugar and purity, were made, and 
various conclusions were drawn as to the effects of the manures. 
Among others, it appeared that some of these effects differed 
significantly at different farms. 

The next thing, clearly, is to connect up these differences with 
the character of the soil and weather at the various farms, but 
though mechanical and chemical analyses of the soil were carried 
dfct, there is no mention in the report of any attempt to do this. 
Presumably there was no marked connection, and further results 
stjpp. vol. in. no. 2 . F 



120 


Gosset —Discussion 


[No. 2 1 


are awaited, for if “ 8 of the 15 centres gave significant increases 1 
yield of roots with sulphate of ammonia, while the remaining , 
centres showed no appreciable increases,” the value of the result* 
to the individual farmer will be much increased by some indicatior 
of whether his land is to be classed with the 8 or the 7 . I call 
attention to this in no spirit of criticism, but in order to bring out thq 
full possibilities of co-operation on a still larger scale. 

Both Dr. Beaven and the Rothamsted school have maintained 
that their methods can be carried out by the ordinary farmer; and 
if for ordinary you substitute exceptional, I agree; but the business, 
even of the exceptional farmer, is to farm, and he cannot afford the 
time to weigh up small experimental plots when he ought to be J* 
getting on with his work. < 

And so, while a co-operative series of experiments should always 
include a majority carried out on ordinary farms, there must be 
trained supervision and cultivation money, and this can only com^ j 
from the Government, working through institutions like the National 
Institute of Agricultural Botany or Rothamsted. 

Furthermore, the more complicated the method, the more 
supervision is required; one man can just look after ten experiments 
with acre plots, with half-drill strips you probably want at least 1r *hree, 
and for more complicated experiments even more; but farmitg is a 
large industry, and a gain, even a small gain, per acre on 100,000 
acres soon pays for the cost of making experiments. 

Appendix 

The Error of Half Drill Strip Experiments. 

The half-drill strip technique has been criticized on the ground 
that no valid conclusion can be drawn from experiments carried out 
by it, and it may be well to examine what truth there is in the 
assertion. 

Essentially the method consists in sowing long narrow strips of . 
two varieties of cereals in alternation. By an ingenious arrangement 
at sowing, these strips can be split longitudinally at harvest, and 
each half strip of one variety is compared with the half strip of the 
other adjacent to it; to balance the linear term of the fertility 
slope, the series begins and ends with a half strip of the same variety. 
The series is therefore of the form ABBAABBA . . . ABBA , and 
to calculate the error of the difference (A — B) a degree of freedom 
is allocated to the fertility slope. This is determined by the differ¬ 
ence (S(AB) — S(BA))l/n, where S(AB)_J. is taken to be the sum of 
A — B for all the comparisons AB t S(BA) for all the comparisons 
BA and n the number of pairs. 



1936] on Co-operation in Large-Scale Experiments . 


121 


Thus the analysis of the variance is given in a table of the form 



Degrees of 
Freedom. 

| Sum of Squares. 

Fertility slope 

1 

{S(AB) - S(BS))nin 

Random error 

n — 2 

S(A ~ Bf - (S(AB) - S(BA)) z lin 

Total 

n — 1 

S(A - P) a 


If, then, the variation in fertility con^sted of random deviations 
superposed on a uniform fertility slope,' the procedure would be 
beyond criticism; it remains to be seen how departures from such an 
ideal system invalidate the argument. 

The almost universal departure is that the fertility slope is not 
uniform, there are, ideally speaking, parabolic terms, so that the 
position AB represents a different advantage to A at different 
points in the series. This will have the effect of increasing the 
apparent error, since the sum of the squares of the differences, 
fi(A — B) 2 t includes just as large a component due to the fertility 
slope, while the component calculated, (S(AB) — S(BA)) 2 ljn, 
is smaller; this is because the sign of AB (and of BA) changes on 
p r ssing from a falling to a rising part of the curve. On the other 
L nd, there is a corresponding increase in the real error owing to the 
fertility slope not being accurately balanced, this error amounting at 
most to 2 fn of the fertility slope between a pair for each change of 
direction. 

Furthermore, unless the fertility slope is of a periodic nature, 
a case to be considered later, the incidence of these changes of cur¬ 
vature will be random, so that the general tendency will be slightly to 
over-estimate the error, a fault on the right side for most of us, and one 
which is compensated by the smallness even of the apparent error. 

Periodic fertility slopes may undoubtedly occur, but apart from 
those due to the works of man, they must be so rare as to add a 
negligible risk; where, however, they are due to such causes as old 
ploughman’s 44 lands,” it should be possible to avoid them by in¬ 
spection; even if they have been overlooked, the chance of their 
affecting the mean difference is small, for to do so the period must 
very nearly coincide with an odd multiple of the width of a whole 
^brip; in general, it is the apparent error that would be increased, 

I We may therefore conclude that there is a slight tendency for the 
error of a half-drill strip experiment to be over-estimated, so that 
somewhat fewer significant results are obtained than if the real 
error could be accurately determined; this is more than made up 
for by the s m allness of the error itself as compared with that of most 
other arrangements. 



122 


Discussion 


[No. 2, 


There remain two other criticisms; firstly, that the system 
of drilling is such that half the coulters of the drill are allocated to 
one of the varieties and the rest to the other; if, then, the coulters 
on one side are badly set or stopped up, the other may have a constant 
advantage. This, though a real possibility, and one to be guarded 
against by careful inspection, is not as serious as it sounds, at all 
events with barley; for barley automatically fills up gaps to such 
an extent that the alteration in yield by large changes in seeding 
rates is almost inappreciable, so that within wide limits of faulty 
seeding it is the area devoted to the variety which counts, and not 
the exact distribution of seed within it. 

The other criticism has more substance; by the half-drill strip 
method only two varieties are directly compared. This is just what 
is wanted where a standard variety or rate of manuring is to be 
compared with a competitor for the rank of standard; but if two or 
more varieties are to be compared with the standard, their inter- 
comparison is, of course, subject to a much greater error. 

Up to the present, the half-drill strip method has, as far as I 
know, only been used for cereals in these Islands and in New Zealand, 
but it should be equally useful for such manures as can be drilled, 
and a modification has even been suggested for a forest experiment. 


Professor Fisher: When I first understood that Mr. Gosset 
was to give to this Section a Paper on Co-operation in Large-Scale 
Experiments, I had looked forward to a discourse on a topic all too 
little explored, to which Student’s ” special opportunities and 
experience would have given a very exceptional interest. Co¬ 
operative experimentation, for which there is an immense future, 
not only in agriculture, but at least equally in medical and other 
research, always encounters special initial difficulties from which 
individual research is free: the difficulty of a number of persons 
representing different interests agreeing to pursue a common aim, 
or rather a connected group of aims; the difficulty of settling the 
method by which the aims agreed on shall be pursued; the pre¬ 
liminary training of co-operating staffs in technical procedures, which 
it is important they should carry out in a comparable manner. 
Personally, I should like to express the hope that later on we may have 
the opportunity of hearing Mr. G-osset’s views on these large questions. 

What is my own disappointment may, I hope, be to the benefit 
of others. If Mr. Gosset has chosen to give his time to more familiar 
and elementary aspects of the subject, we can at least recognize the 
justification that these elementary aspects are still occasionally 
misunderstood, even to a ridiculous extent. We have always had, 
and doubtless always shall have, persons who like to speak with 
authority on experimentation, and whose pleasure it is to take credit 
for superior knowledge by the simple process of demanding higher 



1936] on Co-operation in Large-Scale Experiments . 


123 


precision. If we use ten replications, they can ask for twenty; 
if we use fifty, they can ask for a hundred. “ That ,” they say in 
effect, “ is what I should call a good experiment. All you mis¬ 
guided and negligent people are, of course, content with a lower 
standard than mine/’ The history of the discussion of experimenta¬ 
tion is full of examples of this type of pose, and, though now it is 
happily becoming rarer, it may be long before it is extinct. Though 
formerly irritating, it is now more simply amusing, since the obvious 
and universal fact is now widely realized that experimentation, 
like other practical activities, is an application of limited resources 
to the ends sought. Consequently, a modification of design, if it 
claims to be an improvement, must balance any expansion of work 
in one direction by economizing it in others. The ratio of information 
gained to work expended is the measure of the aptitude of an experi¬ 
mental design. Obvious as this point is, our discussion this evening 
will be an exceptional one if, in the course of it, it is not frequently 
ignored. 

The object of co-operation is, by co-ordinating the work of several 
different investigators, to obtain not only the value of each of the 
individual researches, but also additional knowledge due to combining 
their results. Non-co-operative work may sometimes have a single 
aim, though this is not always advantageous. Co-operative work 
must always have a multiplicity of purposes. In agricultural 
co-operation the special local knowledge, with its higher precision, 
but more limited application, must always be held in view, in addition 
to the general knowledge, more widely applicable, but of lower pre¬ 
cision, which a multiple trial can give also. With organized co¬ 
operation an analysis of variance can sort out the information supplied 
on the different questions, and can estimate the precision of each. 
It appears from such analyses that a single part of the general in¬ 
formation—that concerned with general averages—is sometimes 
little affected by the precision attained at individual centres, and can 
be used even when this precision is unknown. This fact should not 
be regarded as a reason for neglecting local precision, for, not only 
is the local value of the results entirely dependent upon this precision 
being adequate, but a second portion of the general information—that 
relating to the variability, and therefore to the reliability of the general 
results—also depends on the precision of the local comparisons. 

For example, it may be true that the Danish experiments to which 
“ Student ” has referred, established that Prentice was the most 
suitable barley at that time available, if any one barley was to be 
grown over the whole of Denmark. If, as is implied, the individual 
trials were of low precision, the experiments could not have shown 
that there were not in Denmark important districts, or soil types, in 
which other varieties would have done consistently better. The 
advantage of modem methods lies in their capacity to recognize the 
genuineness of discrepant results obtained at different places, or at 
different times, and so to perceive the limitations of generalizations 
which might otherwise he accepted uncritically. 

I am extremely glad that Mr. Gosset does not very seriously 
expect the figures for Table I to be accepted as evidence that the 
change in average yield reported is caused by the change in varieties 



124 


[No. 2, 


grown. Apart from the special explanations which have to be 
sought for discrepant values, comparisons over a period of years in 
which cultivation and manuring have been subjected to considerable 
changes, and which cover the life-time of the cultivators and the 
returning officers, cannot legitimately support any such inference. 
It may well be an industrial convenience that the greater part of the 
land under barley in Ireland should grow the same variety. It 
cannot be argued that this convenience has been gained save at the 
expense of lower yields on farms better suited to other varieties. 
The fact remains that until the individual trials have a sufficient 
precision, and one that can be validly estimated, they cannot supply 
independent information, which it is in the interest of the individual 
farmer to possess. 

Mr. Gosset has added an Appendix on a controversial side-issue 
of less importance. Since his invention of this method Dr. Beaven 
has advocated the half-drill strip with unrelenting eloquence. As a 
great advance on previous methods, it has also received unceasing 
commendation, even from those who now prefer not to use it. It 
was put forward at a time when the nature of experimental errors in 
agriculture was little understood, and, as Mr. Gosset points out, it is 
exposed to serious criticism from several points of view. As is to be 
expected from his life-long association with Dr. Beaven, Mr. Gosset 
is concerned to minimize these criticisms. If the error is estimated 
in the way he suggests, which is not a method which has been always 
used, he thinks the error will be slightly over-estimated. This may 
be so on the average, but this is not incompatible with the error being 
sometimes largely under-estimated, and a systematic bias in the 
estimation of error in either direction is a serious drawback. More¬ 
over, these arguments a priori can only lead to a pious opinion. 
The serious fact is that the actual errors of the split-drill method are 
always unknown, and though the result of a trial may be ornamented 
by the addition of the standard error, estimated by some plausible 
process, such estimates can never be scientifically on the same level 
as are standard errors of known validity. Mr. Gosset says that the 
alterations in yield of barley by large changes in seeding rates are 
almost inappreciable. Had he said often inappreciable, one could 
have agreed; for this is true of many important disturbing factors 
other than seed rate. It is quite another matter, however, if we are 
asked to assume that systematic errors due to faulty seeding are 
always inappreciable. 

Dr. Wishart : May I say how much I have enjoyed listening 
to Mr. Gosset’s opening remarks. I must confess to a slight feeling 
of disappointment on first receiving an advance copy of the paper 
before us that the material was not more bulky, for Mr. Gosset is 
always interesting. I soon realized, however, that his series of short 
paragraphs were purposely designed to bring out a large number of 
points for discussion, and that, after all, is the function of the opener 
of a discussion. 

I propose to limit myself to certain remarks concerning the 
transition from the single experiment to the multiple one. When 
we have learnt how to carry out an accurate experiment at one time 



125 


1936 ] on Co-operation in Large-Scale Experiments. 

and place, then we can say, with a certain degree of uncertainty, 
expressed by a standard error calculated from the replicate plots, 
what differences have emerged under the particular conditions, 
that is, on the soil selected, and with its given fertility, with certain 
allocated basal fertilizer dressings and cultivation treatments, and 
under the special climatic conditions of that season and place, con¬ 
ditions which will never be repeated exactly. To secure accuracy 
in a single trial, we usually impose conditions which limit the 
general applicability of the results, e.g homogeneous area, extra 
precautions in sowing, sufficiency of nutrients other than those tested, 
etc. Sometimes, even, the site is selected so that the experiment 
may have a chance to come off ; for example, land in a low state of 
fertility is chosen in order to show the effect of a nitrogenous 
dressing on wheat. It has never been suggested that a process of 
random selection should be applied to the site of the single experi¬ 
ment or, as Dr. Sanders said in Cambridge on Monday, we might 
land up in someone’s back-yard. But it is to be remembered that the 
results can be applied with strictness only under the particular 
conditions of the experiment, while the error, which is appropriate 
to the comparison of a number of closely adjacent small plots, will 
be inapplicable to comparisons of differently treated whole fields. 

I wonder whether, in the first flush of enthusiasm arising out of 
the new technique, it was always recognized that the mean of a sample 
of plots was as important for the purpose of analysis as the variance. 
I have in fact seen published accounts of variance analyses applied 
to experiments where the means were not stated. Quite apart from 
the necessity to replicate plots in order to get a valid estimate of the 
error, the methods are designed to secure accurate and unbiased 
estimates of the mean of each treatment. That comes about in the 
ordinary trial by the method of random grouping of treatments 
in blocks or the rows and columns of a Latin square, with replication 
of plots having the same treatment. The desire to keep the experi¬ 
ment down to manageable dimensions, while at the same time in¬ 
corporating a large number of treatments, has sometimes resulted 
in quite inadequate replication. "One hears experimenters say, 
“ I have eight treatments, therefore four replications in randomized 
blocks will give me twenty-one degrees of freedom for error, which is 
adequate/ 5 Adequate, yes, for the estimation of the standard 
deviation of plot yield, but is it adequate for the accurate estimation 
of a mean? I admit, of course, that special methods, such as the 
use of a factorial design, will improve the precision of .the main 
comparisons it is desired to make. 

This brings me to multiple trials, for here, because replication 
within a single trial is not the basis of estimation of an error applicable 
to the average responses over the whole series, there is a tendency to 
be content with very inadequate replication, or none at all, at the 
individual centre, while the importance of randomization is not always 
Realized. We require an estimate of the mean yield for a particular 
treatment at each centre or season. Then the error to be applied 
to the average of all centres or seasons is derived from the interaction 
between treatments and centres. Now, how accurate must the 
estimation of the mean at each centre be? Each centre can be 



126 


Discussion 


[No. 2, 


regarded as a sample of certain soil and climatic conditions: thus 
there is perhaps no need for meticulous accuracy in the estimation 
of the mean yield of particular small plots in a particular field. But 
there is need for an unbiased estimate for each separate treatment; 
each should effectively sample the site chosen. This seems to imply 
that the plots should still be small and should be interwoven in some 
sort of randomized pattern, and that some replication should exist 
at each centre. This is, I believe, the reason behind the half-drill 
pattern of variety trials advocated by Dr. Beaven and Mr. Gosset, 
which are not intended to be taken singly. 

With such methods we are fairly confident as to our mean values, 
and the next thing is to spread our net widely, and, if results are to 
be generally applicable, i.e. } to the country as a whole, we must 
sample all counties and all soil types, etc., and carry on for a number 
of seasons. But we must distinguish between results which may be 
demonstrated to hold generally, as that a new variety will on the 
average be superior over the whole country to an old one, and results 
which vary between the centres or seasons. It m&y be that a differ- 
ence is noted in some places, but not in others. This would be shown 
in a significant interaction between treatments and centres, but it is 
this interaction which will be used as a measure of general error, 
so that the general error will be increased by such interaction, and 
we might find, e.g ., that the mean difference between two treatments 
or varieties was insignificant over all experiments, though it might 
be large in some. As Mr. Gosset has said, the farmer will want to 
have results applicable to his conditions. The two things, then— 
the general result and the individual result—have to be kept distinct 
in our minds. If we want to be able to demonstrate differences in 
response at different centres, we must arrange, as Rothamsted and the 
sugar-beet factories did in their experiments, that the trial at the 
individual centre be of a sufficiently detailed type to permit of 
accurate estimations of treatment yields and of their error. I 
gather that Mr. Gosset approves of such detailed trials, but would 
like to go further by connecting up the results with climatic and soil 
conditions. 

If this discussion does something to help on the establishment of a 
really national series of trials, as fully adequate in their sampling of 
the soil and climatic conditions of different agricultural districts as 
they are in their sampling of the responses at one particular place, 
then it will have been worth while, while the trials, if they come off, 
should do a great deal to increase our knowledge of the factors 
affecting productivity. But it requires to be done nationally. We 
have, in breeding work, the National Institute of Agricultural 
Botany. Then we have institutions like Rothamsted, and provincial 
centres working in association with the regional colleges and ex¬ 
perimental stations, under the general direction of the Ministry of 
Agriculture. The network is there, and to some extent, the organiza¬ 
tion. It is to be hoped that the necessary money will be forthcoming 
since it is clear that it would be well spent in this way. 

Mr. Yates said he would like to express his entire agreement 
with the remarks of Professor Fisher on the importance which co- 



1936] on Go-operation in Large-Scale Experiments. 


127 


operative experimentation was likely to liave in tlie future. Indeed, 
it seemed to him that the present ineffectiveness of science in dealing 
with economic, sociological and biological problems was very largely 
due to a complete lack of any adequate technique of co-operative 
experimentation. Why was it that the wireless sets, cars, and 
all the other mechanical toys that graced the present civilization were 
so perfect, while man remained in ignorance of the biological conse¬ 
quences of the most simple and elementary changes in Ms environ¬ 
ment ? Why was it that so much should be known of the genetics 
of the fruit fly, dwsopkila melanogaster ) and so little of the genetics of 
homo sapiens ? It was, he suggested, because these latter problems 
could only be studied by the method of collective research, involving 
the co-operation of large numbers of individuals, whereas one man 
could usefully amuse himself with wireless sets, or even with 
drosophila. 

To take a very simple example, it was surely a remarkable fact 
that the need of so commonplace an article of diet as milk was a matter 
of controversy, and that the probably harmful effects of pasteuriza¬ 
tion had never been adequately investigated, although this treat¬ 
ment was now advocated as a compulsory measure? The one 
collective experiment that was designed to test these two very simple 
points was a sorry example of the incredible waste of effort that 
could occur through lack of attention to the principles of experimental 
design. Although nearly ten thousand school children received a 
supplementary ration of milk for about four months, some raw and 
some pasteurized, the difference between the effects of raw and 
pasteurized milk on growth in weight and height was left in ob¬ 
scurity, owing to the fact that the children at each school received 
either wholly pasteurized or wholly raw milk. Indeed, the situation 
was worse than it appeared at first sight; for this very obscurity had 
allowed those responsible for the investigation to draw the conclusion 
that the effects of raw and pasteurized milk were equal, whereas in 
fact it appeared from their own published figures that pasteurized 
was considerably inferior to raw milk 1 

This brought him to one of the primary difficulties of collective 
experimentation—namely, that of persuading the individuals 
concerned that it was essential that they should each of them under¬ 
take two or more treatments, and what was even more commonly 
overlooked, that they should assign these treatments at random to 
the experimental material available. It was fatally easy from 
humanitarian motives to give the milk to more weakly children. 
It was fatally easy to assign the more weakly animals to the role of 
controls, on the ground that the animals under this treatment were 
not likely to come to much good anyhow. 

Then there was the further task, already referred to by Professor 
Fisher—namely, that of persuading those who undertook co-operative 
experiments of the necessity of carrying out the agreed scheme, and 
not introducing what they considered might be improvements, and 
which in fact might have been improvements had they been thought 
of and carried out from the start by everyone. A very real sacrifice 
of initiative must be made by the collective experimenter. One 
of the great tasks before the organizer of collective research, therefore, 

x2 



128 


Discussion 


[No, 2, 


was to arouse and sustain the interest of his co-workers. It was a 
task that involved much hard work, much care and thought, and 
much tact. In consequence, it was apt to be neglected, for scientific 
workers had in general little taste for this sort of thing. 

Another difficulty which frequently stood in the way of organizing 
co-operative research was that of providing adequate money grants. 
Moreover, the attitude that since science was its own reward, there¬ 
fore co-operative research should be undertaken with very inadequate 
or no remuneration for the work involved, was, in his opinion, wholly 
mistaken; for even if one could expect individual scientific research 
to be carried out on a voluntary basis—which in fact was not now 
the case—it would not be fair to expect a man to sacrifice a great 
part of the freedom which he possessed in such research without 
some compensation. The attitude that co-operative research 
brought its reward to the individual workers concerned in the 
shape of additional information obtained, was too ridiculous to 
require any comment, seeing that the results of such research were 
usually published; but such an attitude was very frequently taken. 
An important advantage resulting from the provision of adequate 
remuneration for the work involved was that continuity and con¬ 
sistency of research could thereby be better ensured. 

The task before the organizer of co-operative research was then 
fourfold. Firstly, he must put forward a programme of research 
which was likely to provide an answer not only to the questions 
which were dear to his own heart, but also to those which were of real 
common interest. Secondly, he must devise a technique which was 
within the competence of his co-workers, and was also as efficient as 
possible. Thirdly, he must gain and sustain the active interest and 
co-operation of these workers. Fourthly, he must arrange for 
adequate financing of the scheme. 

Mr. Yates was best acquainted with co-operative research in 
agriculture. He would like to convey his thanks to Mr. Gosset for 
his admirable opening to the discussion. Mr. Gosset could be 
regarded as the father of modern statistical methods (which had 
themselves been most fully developed in agriculture), for the first 
test of significance applicable to small samples, now known as the 
t test, was due to him. 

One or two points might be mentioned. Mr. Gosset had rather 
implied that co-operative agricultural research was little practised 
before 1900. He had perhaps failed to give due credit to the very 
large amount of such research conducted on the Continent during 
the latter half of the last century. Mr. Yates hoped Dr. Crowther, 
who was present that evening, would tell them more of this work. 
Perhaps Mr. Gosset had left the impression that co-operative research 
was less extensive in the Colonies and Do mini ons than it was at the 
moment. A very large number of countries were at present working 
on co-operative lines very similar to those described by Mr. Gosset. 

Mr. Yates entirely agreed with Mr. Gosset that the very simple 
experiments were likely to have their place in agriculture for some 
time to come; but it should not be forgotten that there was a large 
overhead for each individual experiment in the shape of that com¬ 
ponent of supervision which was constant for an experiment, whatever 



1936] on Co-operation in Large-Scale Experiments . 


129 


its size : for two small experiments, for example, twice as much work 
was involved in getting in touch with farmers, as for one. There 
was, however, no excuse for not using proper statistical principles in 
the design of such experiments. It was even more important to 
re-randomize the treatments on each area when there was only a 
single replication than when there were a large number. The 
consistent arrangement of the treatments, N, NP , NPK , NK, PK , 0 
in this order was wholly deplorable, both because the plots at the 
edges were very frequently worse than those in the centre, and because 
each treatment comparison had a different error variance depending 
on whether the plots were close together or widely separated. 

Mr. G-osset had mentioned a co-operative manurial trial on sugar 
beet conducted from Rothamsted, and had, perhaps accidentally, 
created the impression that they had not troubled greatly about the 
differences in soil type. The whole subject bristled with difficulties, 
but it would be unfair to Dr. Crowther to imply that this aspect of 
the subject was being neglected; it was essentially one of the chief 
objects of the set of experiments, which was still being continued. 

In conclusion, Mr. Yates said he would like to mention a piece 
of agricultural co-operative research which was being conducted 
from Rothamsted, and which provided a good illustration of his 
previous remarks. This was the scheme of sampling observations 
on the growth of the wheat crop under the auspices of the Agricultural 
Meteorological Committee of the Ministry of Agriculture, the Scottish 
Agricultural Board and the Meteorological Office. Simple quantita¬ 
tive measurements of two or three varieties of wheat were taken at 
each of the ten centres, with the object of elucidating the effects of the 
weather on the growth of wheat and the interconnections between the 
various stages of growth. It was perhaps indicative of the difficulty 
with which even the simplest co-operative research was at present 
faced, that this scheme took many years to establish. 

The scheme was truly co-operative. The observers were for 
most of their time engaged on the other tasks at agricultural colleges 
and similar institutions. It was fortunate that a small grant was 
received from the Ministry. Provision had been made for the study 
of results as they accrued; a proper and rigorous technique had been 
co-operatively evolved and was being followed by the various 
observers. The scheme was in fact an excellent demonstration of the 
possibilities of organized collective research, and he was glad to be 
able to say that it was already yielding results which were far ahead 
of anything previously obtained in this field of research. 

There was no need to discuss details here, but one point of some 
importance might be mentioned: every station was visited every 
year, and every station sent one or more representatives to a conference 
of observers held each year at Rothamsted. This arrangement was 
undoubtedly a very great contribution to the success of the scheme. 

Dr. Hunter did not think he could add anything very important 
to the discussion from the statistical point of view; but he welcomed 
this opportunity of paying his tribute to Mr. Gosset for the assistance 
he had given not only to him personally, but indeed to all agricultural 
experimenters. 



130 


Discussion 


[No. 2, 


There were several points in the paper to which he would like to 
refer, the first being the reasons for the size of plots—one acre— 
adopted for the scheme of experiments in Ireland. One condition 
operating here was the provision of pabulum for the malting tests 
carried out by Messrs. Guinness, and the other, the educational point 
of view. The general outlay of the experimental work started in 
Ireand in 1900 was the only kind possible, as the farmers viewed 
anything in the nature of small plots with suspicion, and to influence 
them at all it was necessary to provide, as far as possible, a visual 
demonstration of differences. That, coupled with the reason of 
providing material for malting, influenced the choice of the size of 
plot, which for two years was two acres, and later was reduced to one 
acre. 

As to the official yield figures presented by Mr. Gosset, the official 
yield of all crops in Ireland tended to be lower than the actual yield. 
The farmers in that country might be peculiar in this respect, but 
they always liked to represent that their yields were rather lower 
than they actually were. Consequently it is not surprising that the 
official figure of the yield of crops was always lower than the figure 
derived from very representative experimental areas. The trend 
of the Irish barley figures tended to be very much the same as that 
obtaining in Denmark, where there had been a gradual increase 
in the yield of that crop as shown by the official returns in both cases, 
and these increases synchronized with the introduction of certain 
varieties. The increase that Mr. Gosset had shown for the quin¬ 
quennial period 1911-15 was, indeed, probably the result of the 
introduction of one hundred quarters of Prentice barley from Den¬ 
mark in 1906. This seed was distributed amongst farmers in the 
first, second and third years of its propagation, and ultimately the 
whole of Ireland was seeded with the produce of that particular lot 
of seed. Prentice had been found to be the most prolific variety in 
Denmark, and according to the Irish experimental figures was the 
best variety for that country also. 

Mr. Gosset did not mention one of the most important features 
of the Irish work, and that was the co-operation of Messrs. Guinness 
with the Ministry of Agriculture, in carrying out the barley investiga¬ 
tions he has alluded to. It was largely due to their co-operation, 
not only in assisting by malting the produce of a very long extended 
series of plots, but also financially, by bearing the greater part of the 
cost of the whole of the investigations, that the work was initiated 
and is still being prosecuted. 

Mr. S. Bartlett wished to pay his tribute to Mr. Gosset for his 
interesting paper, one outstanding point of which was that it was 
expressed in simple language; yet, when one got down to it, it was 
found to contain much pith, and even more would be found on 
studying it than appeared at first sight. 

Most of the discussion so far had centred on experiments on plots 
and crops. The point in which he was more interested was experi¬ 
ment on animals, and he thought that there was something to be 
said for co-operative experiment in this branch of study. 

Mr. Bartlett said he would like to make a few remarks about an 



1936] 


on Co-operation in Large-Scale Experiments. 


131 


experiment on a fairly large co-operative basis which was now being 
run at Reading with dairy cows, the idea being to test the effect of 
certain diets on the milk yield. They were driven to co-operative 
experiment because it was quite impossible in one farm to get a 
sufficient number of cows to measure with any degree of accuracy 
two conditions so closely allied as to yield a difference of, say, 5 per 
cent. Having set about the idea of a co-operative experiment, 
they wondered whether it should be simple or complex, and again 
they were driven to simplicity, because it was quite impossible to 
ask farmers to co-operate in a complex experiment, the result being 
that they decided to test only two diets, and make a comparison 
between them. It would have been quite impossible to persuade 
the farmers to co-operate if*they had asked for a bigger variety in 
experiment than simply a control of two items. The system was to 
approach the farmers and to divide their herds into a succession of 
pairs, taking all the animals on the farm and pairing them in the 
same way. Whether these pairs were ultimately used remained for 
the Reading research authorities to decide, and the decision whether 
a pair should be used or not depended upon whether both animals 
in the pair conformed to certain minimum standards which were 
laid down. 

Mr. Bartlett thought it might be of interest to raise this question 
of animal experimentation, and he would especially like to know 
whether anyone had been able to arrange a complex experiment 
with animals on a co-operative basis. So far as he knew, it was almost 
impossible to arrange, but if it could be arranged, it would in some 
ways be an advantage. 

Mr. Fairfield Smith said Mr. Yates had mentioned that there 
was, perhaps, more co-operative research in the Dominions than was 
realized, and he would like to confirm this statement. For example, 
in addition to their own experiment farms, the New South Wales 
Department of Agriculture organize annually on farm lands more 
than 200 wheat variety trials and 100 fertilizer trials throughout 
the wheat belt of their state, an area about 500 miles north and south 
by 100-200 miles east and west. 

After Mr. Gosset’s remark that Prentice barley was in general 
the best variety in Denmark and Ireland, he thought it might be 
interesting to give the results of experiments in New South Wales, 
observed by Miss Allan and himself. The data were not such that 
the significance of differences at individual places nor the amount of 
interaction between varieties, places, and seasons could be assessed, 
but to general observation the outstanding feature was that certain 
varieties showed themselves good or indifferent with greater con¬ 
sistency than had been expected. These results rather shook his 
faith in the common teaching that it was necessary to breed varieties 
adapted to restricted localities, and such meagre evidence as had 
since come his way, before Mr. Gosset’s Table II, presented in the 
paper, suggested that for fairly large areas interaction of varieties 
and localities might be not greater than interaction of varieties and 
seasons in one locality. Mr. Gosset had brought an exception to that 
rule into his paper. 



132 


Discussion 


[No. 2, 


As indicated by Mr. Gosset and Professor Fisher, this did not 
mean that investigation into local differences should be abandoned 
in favour of considering general differences only. Oil the contrary, 
as things stood they were left in ignorance as to what areas should 
form the breeder's geographical units, and these could be determined 
only after detailed investigation of interactions. Yarietal adaptation 
to wide geographical variations might be illustrated by a European 
experiment organized by Boekholt of Landsberg-Warthe in Germany 
which deserved mention at this meeting if only because it provided 
an example of international co-operation. Four wheat varieties, 
selected from those commonly grown in Sweden, Holland, Germany, 
and Hungary, were grown in the three years 1930-32 at five places— 
namely, one experiment station in each of the four countries of origin 
of the varieties and in Belgium. Lengths of the growth periods, 
ear numbers, etc., were observed. The experiments were not such 
as would satisfy a statistician; but they indicated, as might be ex¬ 
pected, that varieties tended to yield best in their home country. 
The results suggested a relationship between varietal differences 
and adaptation corresponding to the contrast which existed between 
English and Australian wheats. In the coastal climates of Belgium 
and Holland the highest yield was given by the variety with few ears 
and large ears—like the English types—whereas the small but many¬ 
eared type—such as was found in Australia—was best in the con¬ 
tinental climate of Hungary. The authors suggested associating 
these characters with growth periods, photoperiochsm and length of 
day, items of considerable interest to physiologists, but which need 
not be discussed here. 

Similar differences must exist on a smaller scale within countries, 
but to track them down and relate them to their conditioning 
factors, more accurate experiments susceptible of statistical analysis 
were required. A beginning had been made in Australia by the 
Council of Scientific and Industrial Research, whose Genetics Section, 
with the assistance of the Waite Institute and the State Departments 
of Agriculture, had organized an experiment with about ninety wheat 
varieties grown at four places in New South Wales, South Australia 
and West Australia. Observations were taken on growth periods, 
tiller numbers, ear numbers, grain size, etc. The first year's results 
were just becoming available when he (the speaker) left Australia. 
Although no statistical analysis of interactions had yet been attempted 
and, as was usual in such work, a number of misfortunes prevented the 
first year's results from being all that could be desired, the consistently 
good or bad behaviour of certain varieties at all stations seemed to be 
again a feature. At the same time certain broad features limiting 
the range of variety types within which such relations could be main¬ 
tained, such as that late varieties failed in Western Australia, were 
also evident. By research of this nature, accompanied by detailed 
knowledge of experimental errors, it was hoped to obtain some 
inkling of the ranges of varietal adaptation and of the environmental 
variables which were most important in relation to this problem. 
When those environmental factors which could best be met by 
breeding specially adapted varieties had been determined, the 
geographical units to be adopted by breeders might then be mapped 



1936] on Co-operation in Large-Scale Experiment's. 133 

out according to the relative ranges of geographical and seasonal 
variation of these factors. 

Dr. E. S. Beaven was sorry that owing to a physical disability 
he had been unable to hear more than a few words of the discussion, 
but a friend had given him one or two notes of some of the things 
that had been said. Also he had had the advantage of seeing the 
proof of Mr. Gosset’s paper, in which some space was given to an 
examination of the value of a particular method of experiment, 
namely, the half-drill-strip system—with which he had something to 
do. He was not sure whether Professor Fisher had damned it with 
faint praise, or unqualified censure, but in either case his withers 
would be unwrung. 

In the paper Mr. Gosset dealt with co-operative experiments, and 
stressed the need for co-operation, more particularly in agricultural 
experiments, and Dr. Beaven had had a long experience of this. 
The first part of Mr. Gosset’s paper referred to those Irish barley 
experiments carried on for about twenty-five years which, following 
on some Danish experiments in the same direction, might be called 
classical. Their practical value to British agriculturists, at any 
rate, whatever might be the case with regard to many hundreds of 
experiments which had been carried out during the last five or six 
years by various “ random ” methods at Rothamsted and other 
places, was quite certain, in that these Irish experiments had led 
to quite definite economic results. There were other series of experi¬ 
ments for which the same thing could be claimed; he referred to those 
of the National Institute of Agricultural Botany, to which also Mr. 
Gosset referred, and which were being carried out by the half-drill- 
strip system. It might be interesting to note that the results of the 
Irish experiments were fully confirmed by these experiments which 
had gone on now for about fifteen years consecutively, always on 
this one system and at half a dozen different stations in England. 
The variety of barley to which Mr. Gosset referred—namely, Spratt- 
Archer—came to England in 1922. Naturally it did not spread 
very quickly for the first year or two, because farmers were con¬ 
servative people, but it was interesting to note that the first time 
it appeared in a list of varieties sent to the Official Seed Testing 
Station (and about 2,000 samples of barley were sent every year for 
testing) was in 1925, when this variety was responsible for 6 per 
cent, of all the samples sent. There was a steady increase year by 
year in this proportion, and last year that variety, which was un¬ 
known in England before 1922, reached 32 per cent, of the total. 
That was an outstanding example of the value of co-operative 
experiments of this kind. The more complex “ randomized" 
experiments undoubtedly had a great value for the purpose of 
illustrating theories of probability, but he was thinking of their 
comparative value in terms of pounds, shillings and pence to the 
growers. 

There was another set of experiments which he had referred to in 
a paper published in the Journal of the Royal Agricultural Society 
twenty-five years ago (Vol. 70, 1909, “ Pedigree Seed Com ”), a 
co-operative experiment on a big scale carried on for ten or twelve 



134: 


Discussion 


[No. 2, 


years in the State of Indiana, with maize, with interesting results, the 
value of which depended entirely upon its co-operative character. 
The experiment was carried out by the University of Indiana. 
Twenty varieties of maize were picked out from the many grown in 
the State, and the State was divided into twelve districts. Five 
of these varieties were sent to each of between two and three hundred 
different farmers, and estimates of their yields were obtained. Each 
of those individual estimates no doubt had a considerable error, 
some possibly of 50 per cent., but when one remembered that there 
were between 200 and 300 in each year, and for several succeeding 
years, he would like to ask the theoretical statisticians whether 
these errors of estimate were not likely to be unbiased and to more 
or less balance one another ? He had a vague idea that the whole 
theory of statistics depended upon the view that unbiased errors 
had a way of balancing one another. He knew there was something 
called u standard error.” He remembered a professor of geology 
saying that the green-sand formation was so called because it was 
neither green nor sand; in the same way he thought “ standard 
error *" was so called because it was neither standard nor error, and 
he commended that somewhat crude idea to some of his statistical 
friends for their criticism. Certain it was, that this Indiana 
experiment to which he referred did lead to some very valuable 
conclusions. 

Mr. Grosset suggested that one way in which co-operative experi¬ 
ments might be useful would be to relate the manuring of land to its 
needs—in other words, to suggest the reason why, for instance, in 
one particular place 3 cwt. of nitrate of soda and 3 cwt. of potash 
salts and 3 cwt. of phosphates might give worse results than 1 cwt. of 
each? He was referring to a very complex experiment now being 
carried out by the Lawes Agricultural Trust, in which no less than 
twenty-seven different questions were being asked of nature at one 
time. Professor Fisher had told them that the more questions they 
asked of nature at any one time, the more valuable would the answers 
be. It was open to argument that if single questions were asked 
twenty times they would be more likely to get reliable answers. 
For instance, the conclusion which followed from the Indiana 
experiments was that the variety with the longest possible growing 
period in any particular part of the State was the variety which gave 
the highest yield; in other words, the experiment did more than 
provide a set of merely empirical figures. It provided a rational 
explanation, and it turned out to be of great value to the maize- 
growers of the State. 

Sir Daniel Hall said that he had a certain paternal interest in 
this subject, because a long time ago he began to study this question 
of the relative sizes and magnitude of the area about experimental 
plots, and now, as he happened to be Chairman of the Council of the 
National Institute of Agricultural Botany, he was particularly 
interested, as they were concerned with what manner of experimenta¬ 
tion should be adopted in these national trials in order to get informa¬ 
tion for the farmer as to the relative merits of different varieties, and 
they still remained faithful to the half-drill system of testing as 



1936] on Co-operation in Large-Scale Experiments . 


135 


against a theoretically more accurate randomizing in a Latin square. 
He thought the reason they stuck to that method of experiment was 
that it fitted with the normal way the farmer managed his land. 
They had to consider the magnitude of the work that had to be carried 
out at those different centres which were not experimental stations 
pure and simple, and so could not obtain the degree of supervision 
and accuracy of work that characterized a station like Rothamsted. 
It was necessary to consider in the design of the experiments how 
far the actual technique would fit in with the normal methods of 
farming, so that in ordinarily skilled hands it could be carried out 
efficiently. In any experimentation, a great deal of the validity 
would depend upon the adaptation of the method of work to the 
personal efficiency of the available experimenter and the physical 
conditions under which the trials must be carried out. Beyond that, 
they wanted to ensure that the experiment should carry a certain 
amount of self-criticism, that the replications would to a certain extent 
check the experimenter, and that they would again check the general 
suitability of the station for such experimental work. 

The one thing he would plead for in all these experiments was that 
they should be watched and looked over and designed with as much 
of an agricultural eye as possible. He had so often seen the necessity 
of criticism of the conditions under which the data were obtained, 
as of more significance than the data themselves. The statistician 
worked upon such figures as were presented to him; he ought to be 
able to criticize not merely from the design of the experiment but from 
his own personal experience how trustworthy those data were likely to 
be and from what unexpected source of error they might be affected. 

Mr. Gosset had always been so characterized by his immense 
appreciation of the workaday aspect of the conditions under which 
experimentation was carried out. All who had been experimenting 
owed Mr. Gosset a great debt of gratitude, and on behalf of all present 
he wished to thank him for having brought them to-night the 
fruits of many years of work and thought upon this subject. He 
had great pleasure in moving a vote of thanks to Mr. Gosset. 

Mr. M. S. Bartlett sent the following contribution after the 
meeting: The question of co-operation in dairy cow nutrition 
experiments was raised by my namesake, Mr. S. Bartlett, during 
the discussion. While appreciating the difficulties to be met with in 
experiments of this kind, I do not see why something a little more 
elaborate than the simple testing of two diets cannot be achieved. 
Mr. Bartlett did not specify the precise nature of the trials he had in 
mind, but I would remind him that a winter feeding trial (where 
diet can be well controlled) was satisfactorily carried out at Jealott's 
Hull in the winter of 1934-35 with four treatments. There vrould 
seem to be little objection to carrying out experiments on similar 
lines, but arranging the separate u blocks ' 5 of cows at different 
centres, especially as nothing analogous to the local differences 
peculiar to field crops is likely to operate in such experiments. 

Mr. Gosset’s reply was as follows : Thank you very much for 
your appreciation of the paper. I am not going to say very much 



136 


Co-operation in Large-Scale Experiments. 


[No. 2, 


about what has been said because I agree with practically everything. 
At the same time, Professor Fisher must not suppose that I was 
advocating the acre plot as the ideal for experimenting, or even the 
half-drill system. Any method that will reduce the error is good 
enough for me. 

That is an old matter of controversy between Professor Fisher 
and myself. He says to me, “ Your half-drill strips have no validity 
and conclusions cannot be drawn from them ”; I say to him, “ Your 
errors are so large that no conclusions are drawn.” Neither of these 
criticisms is true, and the one is about as good as the other. 

Nevertheless, if the error is not reduced below a certain point, 
either by replication or an intelligent layout, it may regrettably 
happen that no conclusion can be drawn, and the whole labour is 
in vain. 

Professor Fisher agrees with me that official statistics are a slender 
guide; I put them in merely as a “ corroborative detail the real 
evidence is given by the experimental plots which I have described, 
and which show that there was to be expected just such an increase 
as has actually occurred. As regards the general outline of the 
paper, I meant to show the possibilities of co-operation in agriculture. 
Even if you use the simplest method you get valuable results by 
co-operation; if you reduce the error you get more, and if you so 
arrange matters as to be able to analyse the variance you again get 
more; while if you can go further and calculate the correlation with 
the various characters of the soil, you will get more still. We have 
not reached that yet, though Kothamsted is approaching it. 

With regard to the possibility that some farms or even areas lose 
by sowing a variety of barley which suits the country as a whole, 
I agree with Mr. Smith; our experience has been that if you ffrid a 
particular barley is better than another in one place, then within a 
reasonable range of climate and soil, it is pretty certain to b^t it 
anywhere else. 

Mr. Yates said that I had not mentioned a lot of early Continental 
work. I am not going to say that I know all the work done on the 
subject, but I am well aware that there has been a lot of co-opera¬ 
tive effort, in New Zealand, Australia, and, especially, in the United 
States and on the Continent. I only required one or two instances 
for my argument and naturally chose those of which I could most 
easily give details. 

Dr. Wishart’s contribution gave me particular pleasure, for he 
seemed to have understood what I was driving at better than I 
did myself, and certainly expressed it more clearly. 

With regard to the question raised by Mr. Bartlett and commented 
on by his namesake, complex experiments with animals can be, and 
in fact have been carried out at one or more experimental stations. 
This, however, is hardly the same thing as a co-operative experiment 
carried out at ordinary farms where the difficulties of multiplying 
the experimental diets are almost insuperable. On the other hand, 
the restriction to two diets is balanced by the advantage of sampling 
widely the variation in farming practice, which, pace Mr. M. S. 
Bartlett, may in such experiments play much the same part as soil 
differences do in Agronomy. 



1936] 


137 


Statistical Methods Applied to the Manufacture of 
Spectacle Glasses. 

By C. E. Gould and W. M. Hampton, PhD., B.Sc., F.Inst.P., A.I.C. 

[Read before the Industrial and Agricultural Research Section of the 
Royal Statistical Society, May 28th, 1936, Mr. B. P. Budding in the Chair.] 

The Royal Statistical Society recently suggested that there must 
be many problems in industry which could provide data of interest 
to the statistician. The present paper gives a discussion of an 
attempt to use statistical methods in an industrial problem of 
considerable complexity, and the figures used and the conclusions 
drawn are put forward, together with the original data, in the hope 
that the resultant discussion will be of use to two bodies of people. 
In the first place, the discussion may help in convincing industrialists 
that the use of statistical methods can enable conclusions to be 
drawn from a mass of apparently unconnected figures; in the 
second place, the criticisms of the professional statistician will be of 
considerable use to the authors where these conclusions confirm or 
contradict those already drawn. As Fellows of the Society are 
probably not familiar with the normal process of manufacture of 
spectacle glass, a brief description is given below of the essentials in 
the manufacture. 

1. Description of Process. 

Spectacle glasses, like most other types of glass, consist essentially 
of a fused mixture of sand, alkali, and certain metallic oxides. These 
materials are melted together in pots built of fireclay in a furnace 
which is normally heated by producer gas, and the glass is then 
made into the form of cylinders, which are subsequently flattened 
and cut up for use by the optician. 

The materials are subject to analytical control, and the varia¬ 
tions in the properties of the glass due to changes in the materials 
are extremely small. The pots are built by hand from a mixture 
of selected natural clays combined with a definite proportion of 
crushed burnt pot in order to diminish the contraction of the 
natural clay on burning. A mixture of clays is used in order to 
diminish to some extent the substantial range in chemical composition 
of the individual constituents. The pots are built slowly, allowing 
time for each section to dry somewhat before the subsequent layers 
are put on. The whole process of building a pot occupies anything 
from 3 to 6 months. The pot is not usually used until it has 
been dried steadily for about a year. When required for use 



138 Gould and Hampton —Statistical Methods Applied [No. 2 , 


the pot is transferred to a cold furnace known as the pot arch, and 
here its temperature is raised over some io to 14 days to about 
8 oo° C. During this operation the combined water is driven off, 
the clay loses its property of plasticity and considerable contractions 
take place. It is then transferred to the furnace in which the 
glass is to be made, which has previously been reduced in tem¬ 
perature to about iooo 0 C. These furnaces are normally designed 
to hold two pots, and all the results quoted in this paper are based 
on pairs of pots manufactured simultaneously. After the pot has 
been put in the furnace, the temperature is raised gradually to the 
founding temperature of approximately 1400 ° C., at which stage 
the raw materials for the manufacture of the glass are introduced. 
Several fillings are necessary because of the contraction of the material 
due to loss of combined gases, and the first found occupies some 3 
or 4 days, though subsequent founds are completed in 24 to 30 
hours, since the pot is never subsequently completely empty. The 
progress of fusion is watched by means of <c proofs 5 9 consisting of 
small pieces of glass taken on an iron rod, until the glass is seen to 
be free from bubbles—the exact point at which the founding opera¬ 
tion is stopped being decided on by experience. When the glass is 
presumed to be free from bubbles, the temperature is reduced to 
the working temperature of 1200 ° to 1300 ° 0 ., and after some 
hours, during which the temperature of the glass becomes uniform, 
the manufacture of the cylinders is commenced. A workman, 
known as a gatherer, dips an iron tube into the glass, gathering on 
the end of the iron a blob of molten glass. This is then allowed to 
cool until it is viscous enough to carry the weight of a second 
gathering. The pipe and its ‘ k gob 55 of glass are returned to the 
furnace and the process is repeated three or four times until a sufficient 
amount of glass has been gathered on to the iron rod. This is then 
worked in a block into the appropriate shape required for the 
manufacture of a cylinder, and subsequently the pipe and its 
burden are transferred to the blowing machine. This machine 
allows the pipe to be rotated horizontally and swung vertically 
while feeding compressed air through an easily controllable valve 
to the end of the pipe. By suitable manipulation the glass is blown 
into the form of a long, closed cylinder the end of which is then 
opened, and the final result is a cylinder of glass about inch in 
thickness, 12 inches in diameter and 5 to 6 feet long attached to the 
pipe. This cylinder is then removed by cracking from the pipe and 
the tapering portion on the end cut off. The cylinder is subse¬ 
quently split and opened out in a subsidiary furnace into a large 
flat sheet, which is then slowly cooled in order to anneal it. These 
sheets of glass are transferred to the warehouse, cut up into pieces 



1936] 


to the Manufacture of Spectacle Glasses. 


139 


of an appropriate size for convenient handling and are examined 
for bubbles—known technically as “ seed —and for striae or veins. 
The examination for seed is carried out by girls by means of an 
intense light projected through the edge of the sheet which enables 
the seed to be seen as bright specks against a dark background. 
Each of these seeds is marked, and the pieces are then sorted for 
veins by an optical method. The veins are then marked on 
the plate, and the subsequent operation consists in cutting out the 
material which is free from seed or veins, into circles or squares of 
the appropriate size for the optician. 

2 . Statement of the Problem . 

It is clear from what has been said that the glass which is sold 
is—as far as the tests applied will show—of perfect quality. The 
problem facing the manufacturer, therefore, is one of getting the 
highest proportion of glass free from seed and veins. Other essential 
requirements of a spectacle glass are constancy of refractive index 
and, in the case of coloured glasses, constancy of colour. It is clear 
from the complexity of the process to which the material is sub¬ 
jected that small-scale experiments are not likely to throw much 
light on the defects due to individual stages of the process and, in 
fact, many attempts have been made to control some of the variables 
on an experimental scale, but without success. It appeared to us 
that the only method which offered any prospect of separating the 
effect of the many variables was the statistical one. For several 
years, therefore, samples have been taken from certain specified 
cylinders in order to get a measure of the quality of these particular 
cylinders, since such an examination could be carried out with the 
minimum of delay, whereas it might he several weeks before the 
complete result of a particular journey, i.e. one day's work, could 
be obtained. The normal number of cylinders manufactured from 
a pot is 18 to 20 , and it was therefore decided to take samples from 
the third, tenth and sixteenth cylinders. These samples consisted 
of a strip about 8 inches wide down the length of the cylinder, 
these pieces then being cut up into approximately 12 -inch lengths. 
Samples were marked in the normal way for seed and veins, and 
a count was taken of the number of seed. The length of the 
mark indicating veins was also recorded. From a large number of 
samples marked in this way an estimate of the number of circles of 
a standard size, i.e. 47 mm. in diameter, which could be cut was 
made by pasting on circular pieces of paper. It was found from a 
collection of these figures that a linear relation existed between the 
number of circles that could be pasted on the free surface of the 
plates and the sum of the number of seed plus half the length of 



140 Gould and Hampton — Statistical Methods Applied [No. 2 


veins per unit area. The average figure for the samples taken from 
each of the cylinders was then worked out, and this figure, reduced 
to a common unit of area for glass of a standard thickness, is what 
is recorded in the tables attached to this paper. 

Fig. 1 shows typical results for the first five journeys of the run 
of Glass B from Furnace No. 18, set on September 1 st, 1934.* 
The circles represent the three individual cylinder values, and the 
solid horizontal lines the average of these, with which the present 
analysis is mainly concerned. The dotted horizontal lines are the 
pot means. 

3. The Relation between the Seed and Vein Figures and Quality. 

The first relation considered was whether the figure for per¬ 
centage yield deduced from the counts of seed and veins agreed 
with the yield obtained in the warehouse for the same samples. 
It was therefore arranged that all the specimens, having been 
marked and counted, should be sent on, after flattening, for cutting 
up in the warehouse in the normal manner. The correlation co¬ 
efficient was worked out between the estimated production based on 
the laboratory methods and the actual amount obtained by cutting 
and weighing in the warehouse. The values obtained for the years 
1931-1933 were + o *88 ± 0*04, and for 1934 + 0*82 Az °‘° 3 * This 
factor was considered to be sufficiently high to justify the general 
accuracy of the laboratory method of estimating quality, but it 
was found that the warehouse in general obtained only about 85 
per cent, of the yield calculated as being possible. This was not 
entirely unexpected, as cutting under commercial conditions cannot 
be expected to be as efficient as marking out circles under the 
unhurried conditions of a laboratory. It should be observed that 
this correlation does not indicate that the samples taken from the 
cylinders quoted represent fairly the quality of the whole of the 
glass from that particular journey, but this point is dealt with 
again later, when the variation in quality between the different 
cylinders is considered. 

4. Correlation of Seed and Veins between Fairs of Pots. 

A preliminary inspection of the results of the counts for seed 
and veins in No. 1 and No. 2 pots of all the runs available, indicated 
that there was a tendency for both pots to be either good or bad 
simultaneously. This effect can be seen in Fig. 1 . The correlation 
coefficient was therefore worked out for all the pairs of pots in 
which the same type of glass was being manufactured simul- 

* We are indebted to Professor E. S. Pearson, University College, London, 
for the preparation of this diagram. 



1936] to the Manufacture of Spectacle Glasses. 141 


DATA FOR SEPARATE CYLINDE RS, GLASS B, FURNACE 18,SET ON l 9-34 

JOURNEY 

NO 




POT1. 60 
50 
4-0 


z 30 

id 20 

> \o\ 



i P0T2 60- 
50 
40 

Z 30 

£20 

10 

0 

|J0URNEY| 

NO 



Fig. 1. 




taneously, irrespective^ of what the particular type of glass was. 
The values obtained were :— 



Seed. 

Vem3. 

1931-33 . 

+ 0-50 ± 0-08 

+ 0*77 ± 0*04 

1934 . 

-j- 0*47 bb 0*09 

+ 0*60 ± 0*08 








142 Gould and Hampton —Statistical Methods Applied [No. 2, 

The first of these sets of figures was taken as indicating that in the 
case of veins about three-quarters of the variables controlling their 
production acted simultaneously on both pots, whereas for seed 
only half the variables affected both pots simultaneously. The 
implications of this statement are discussed later in the paper. 
When the results had been obtained at the end of 1933 for these 
correlation coefficients, an examination was made of the records as 
to the previous history of the pots in question, and it was found 
that in the absence of any necessity for doing otherwise, in every 
case the No. 1 and No. 2 pot had been of the same origin, i.e. had 
been made at the same time in the same pot-room by the same 
pot-maker and of the same batch of clay. For the year 1934, 
therefore, a deliberate variable was introduced, in that care was 
taken that No. 1 and No. 2 pots should be as different as the supplies 
of pots could allow. In other words, pots were selected so that in 
all cases they were not made at the same time and as far as possible 
not in the same pot-room, although only one pot-maker's work was 
available. The introduction of this variable resulted in the appre¬ 
ciable change of the correlation coefficient shown for veins for the 
year 1934, whereas the factor for seed was not significantly altered. 
This result was taken as indicating that about one-quarter of the 
factors which controlled the production of veins in pots of spectacle 
glass were decided before the pot reached the pot arch, as all the 
pairs of pots, whether of the same origin or not, were arched in the 
same arches, and set at the same time under similar conditions. It 
must not be forgotten, however, that there still remains a 
significant correlation factor between pot and pot as regards veins, 
this being, in fact, still higher than the total correlation figure for 
seed. This emphasizes the fact that the remaining factors which affect 
both pots simultaneously are more important than the factors 
which have been eliminated by deliberate selection of pots of different 
origins. This is, perhaps, not surprising, as the differences in pots 
which have been introduced so far are relatively slight, as they are 
made from nominally the same clays by the same man under con¬ 
ditions as closely similar as may be. The variables throughout the 
manufacture may be summarized as follows :— 


Pot-making. 


Arching . 


1. Pot-maker. 

2. Hoorn and drying conditions. 

3. Clays. 

4. Age of pot at setting. 

f 5. Number of pots in arch. 

-j 6. Position in arch. 

I 7. Heating conditions in arch. 



1936] 


to the Manufacture of Spectacle Glasses . 


143 


Mixing. 


Founding. 


Making. 


f 8 . Proportions of frit and cnllet. 

L 9. Accuracy of batch, mixing. 

AO. Bottoms. 

11 . Temperature conditions. 

12. Order and size of fillings. 

13. Settling. 

/14. Order in which pots are made. 
tl5. Personnel (gatherers and blowers). 


As stated above, a deliberate variable was introduced in that 
different conditions of pot-making were tried. The factors which 
might have an effect on the relative quality of the two pots are 
items 6 , xo, 12 (rarely), 13 , 14 , and 15 . Item 13 would be 
important only where 14 was operative, as in the case of most of 
the pots considered two sets of men were working, so that the two 
pots were blown more or less simultaneously, and 13 and 14 are 
therefore constant. 15 was clearly of great importance, but unfor¬ 
tunately no records existed for the years 1931-1933 which would 
enable personnel to be referred to any particular pot. but for the 
year 1934 this information was recorded, and the average quality 
produced by each set of men was worked out. No significant 
difference could be found between the various sets who were engaged 
on this manufacture, but it must be remembered that only the best 
blowers and gatherers are selected to make spectacle glass. The 
result only shows, therefore, that either the two sets are equally 
skilled or that the quality is independent of the men. 


5. Correlation between Veins and Seed in the same Pot. 

Correlation factors were next calculated for the relation between 
veins and seed in the same pot, with the following results :— 


Type of Glass. 

1031-1033. 

1031. 

A . 

+ 0-61 ± 0-07 

— 

B . 

+ 0-05 ± 0-15 

+ 0-07 4- 0*11 

C . 

- 0-02 ± 019 j 

- 0*26 ^ 0*27 

X) . 

+ 0-04 ± 0-16 1 

- 0*30 ± 0*15 

E . 

+ 0-31 ± 0-23 

+ 0*02 ± 0*20 


In this table only the factor for glass A is significant, and it is 
interesting to note that glass A is of a distinctly different type from 
the other four glasses quoted. Unfortunately, there is another 
variable introduced in this case, in that glass A was made in covered 
pots, whereas the other four glasses were made mostly in open 
pots. A few results are available for glass B, made in covered pots, 
but they are insufficient to enable a definite answer to be given to 














144 Gould and Hampton — Statistical Methods Applied [No. 2, 


the question of whether the difference is due to open or covered 
pots or to the difference in mixture. As, for other technical 
reasons, manufacture is preferable in open pots this question had to 
be left unexplored. 

6 . Variation of Seed and Veins during Runs. 

It had been generally accepted that the older the pot the worse 
the quality of spectacle-glass made. The data given have been 

Table I. 

Mean Values of Seed and Veins against Number of Journey . 


Journey. 


Number of 
Tots. 

Seed. 

Veins. 


Mean. 

Standard 

Error. 

Mean. 

Standard 

Error. 

1 . 


i 

10 

Type A. 
70-1 

9-5 

| 18*3 

3*5 

9 _ 


7 

54*4 

4*5 

11-1 

4*3 

3 . 

... 

11 

39*8 

2*5 

10*6 

20 

4 . 


13 

58*4 

4*2 

19*1 

2*3 

5. 


12 

78*0 

9*2 

27-6 

5*3 

6. 


12 

68*2 

7*3 

24*5 

6*8 

7. 


8 

64*6 

6-3 

19*4 

4-8 

8. 


6 

56*0 

12-0 

30*0 

14*5 

9. 

... 

6 

77*3 

12-0 

25*3 

9*2 

10. 


4 

75-2 

14*1 

20*3 

7-6 

Mean... 


— 

63-7 

2-8 

20*4 

2-4 

1 . 


26 

Type B. 
58-4 

4-7 

10*2 

1*4 

2 


25 

55-9 

5*2 

19-9 

3-3 

3 !!! 


23 

50*3 

4*3 

16*3 

3*5 

4. 

... 

17 

52*5 

5-8 

14*7 

1*8 

5. 


15 

55*1 

6-3 

18*1 

3*4 

6 . 

... 

11 

53*8 

4*4 

8*8 

1*9 

7 . 

... 

9 

52*1 

5-1 

8*8 

2*3 

8 . 


6 

41*9 

1 6*0 

i 9 * 8 

20 

Mean ... 


— 1 

53*4 

1 2-3 

I 13*3 

0*5 

1 . 


3 

Type D. 
56*7 

7-2 

7*3 

2*3 

2 


3 

43*7 

5*5 

5*3 

1*5 

3 !" 

... 

4 

62*2 

8*5 

7*6 

2*2 

4 ... 

... 

— 

— 

— 

— 

— 

5 . 

... 

5 

53*2 

9-0 

6*0 

1*4 

6 . 

... 

10 

70*4 

6*7 

19*7 

8*4 

7 . 

... 

14 

65*0 

5*2 

20*6 

3*5 

8 . 

... 

18 

64*5 

4*5 

17*0 

3*2 

9 . 

... 

11 

63*7 

5*7 

25*7 

5*1 

10 . 

... 

9 

66*3 

7*0 

24*1 

6*2 

Mean ... 

... 

— 

61*6 

2*5 

16*7 

1*9 



1936 ] 


to the Manufacture of Spectacle Glasses . 


145 


examined for evidence regarding this point, and in order to eliminate 
differences due to changes in composition of the glass, the numbers 
of seed and veins were calculated separately for runs of glasses of 
the same type. The summaries of the data are given in Table I. 
For glass A it will be seen that there is a slight indication that the 
quality improves as far as the third journey, then goes back to 
approximately the same value as that of the first journey, and then 
remains constant until about the tenth journey. In the case of 
glass B there is no substantial change in the quality either for 
veins or seed as the life of the pot increases. Glass D shows some 
slight falling off as regards vems, but this is a type which is so 
rarely made during the early life of a pot that the evidence is 
uncertain. Fig. 2 shows these results diagrammatically. The 



Tig. 2. 


horizontal lines indicate the region covered by the standard error 
calculated from the whole set of data on the assumption that the 
actual figures are a random sample of a much larger number of 
data; the vertical bars stretch from (mean — standard error) to 
(mean + standard error). There is, however, with the exception 
of the third journey of type A, no significant difference between the 
quality of any one journey and the means of all the journeys. This 
result, which contradicted the generally accepted idea on the sub¬ 
ject, is of importance, as it enables an estimate to be made of the 
difference in quality with different types of spectacle glass, since 
without this proof of the independence of the quality on the number 
of the journey, there would be doubt thrown on the results obtained 
with glasses G and D, as these types are usually made relatively 
late in the life of the pot. 






146 Gould and Hampton —Statistical Methods Applied [No. 2, 

7 . Relation of Seed and Veins to Type of Mixtwe and Number of 

Cylinders . 

Since it has been shown that the life of the pot has little, if any, 
effect on the quality of the glass for seed and veins, it is justifiable 
to average all the figures for each particular type of mixture irre¬ 
spective of the journey during which it was made. This has been 
done, and the results, given in Table II, show that there 
are significant differences both for seed and veins between the 

Table II. 

Summary of Average Seed and Veins for each Type of Glass . 


Type. 


Seed. 

Veins. 


Seed 

Veins. 

i 

mu le. 

t 

Mean 

1 

| btou- 
[ dinl 
Error. 

Mean 

Stan- 

dird 

Error. 

made i 

Mean. 

stan¬ 

dard 

Erroi. 

Me in. 

Stan¬ 

dard 

Error 

A 

87 

64*8 

| 2-9 

21-0 

2-1 

_ 

_ 

_ 

_ 

_ 

B 

46 

51*0 

3-6 

8*3 

1-0 

79 

55-4 

2-5 

I 18-2 

1-0 

G 

28 

33-6 

2-0 

14-8 

1-4 

14 

40-9 

4*0 

j 28*0 

4*1 

I) 

40 

59-1 

3-3 

14-1 

2-6 

40 ! 

66-8 

3-3 

21*5 

2-8 

E 

16 

55 9 

2*7 

i 

46-5 

83 

26 

54-4 

4-7 

18-5 

2*1 


different types of spectacle glass. Glass A is significantly worse for 
seed than glass B, and is also worse for veins than any of the mixtures 
except E. It is also important to note that glass C shows, through 


Table III. 

Seed and Veins for Various Types and Number oj Cylinder Means 
and Standard Errors. 


Type. 

1031-1933. 

1934. 


| No. 3. 

No. 10 

[ No. ltf 

No. 3. 

No. 10. 

No. 1(>. 



i 

Seed 

t 



A 

47-7 3*1 1 

57*5 4*2 

90*2 6-2 

_ 

_ 

_ 

B 

— 

— 

j 

41*7 2*9 

57*1 3*7 

69*7 4*6 

C 

32*5 4*2 

29*7 3-1 

52*1 6-0 

30-2 3*6 

37*1 3*4 

55*3 7*6 

D 

50*3 5*5 

51*2 8*1 

52-4 7*9 

59*4 4*2 

68*8 4*7 

62 8 4*1 

E 1 

— 

— 

—- 

59*3 7*8 

56*4 8*4 ; 

55*5 5*5 




Veins . 



A 

11*7 1*8 

17*3 3*3 

29*4 4*6 

_ i 

i _ 

F — 

B 

— 

— 

— 

13*8 1-5 

! 18*8 2*3 

i 20*0 2 0 

C 

18*2 4*1 

16*6 2*7 

23*2 5*4 

23-9 3-9 

20-4 4*8 

37*8 8*1 

D 

— 

— 

— 

18-0 2*5 

17*3 2*9 

24-3 3*8 

E 

24 0 7*9 

i 

30*8 6*4 

23*3 6*4 

16*0 2*7 

15*4 3*0 

24*4 5*2 





1936 ] 


to the Manufacture of Spectacle Glasses. 


147 


all the sets of figures, a marked superiority as regards seed over all 
the other types of glass. This is particularly interesting, in that 
the differences in composition between glass C and glass B are very 
slight, being due only to a small amount of colouring material. 
The average number of seed and veins for each of the third, tenth 
and sixteenth cylinders is recorded for the various types of glass. 
The results are given in Table III for the types of glass where 
sufficient data existed. There is a definite falling off in quality 

Table IV. 


Details of Seed and Veins for Five Runs . 






Pot 1. 





Poe 




< ylrnder. 

Seed, 

Veins. 

Seed. 

Veins. 



Xo. 

Vo. 

Xo 

No 

No. 

No. 

>.0. 

1 >hO. 

Xo. 

Xo. 

Xo. 

. Xo. 



3. 

10 

lb. 

o 

1 It). 

lb 


1 10. 

| lb. 

O 

1 10. 

| lb. 


rj.i 

47 

56 

100 

3 

5 

7 

52 

1 61 

88 

53 

1 11 

1 9 


2 

55 

89 

93 

41 

25 

30 

49 

62 

97 

33 

1 16 

1 11 

R. 1 

3 

35 

57 

56 

12 

12 

8 

34 

60 

72 

13 

26 

45 


4 

78 

67 

113 

16 

6 

39 

47 

93 

118 

13 

7 

55 


L 5 

33 

40 

128 

8 

3 

8 

16 

29 

130 

20 

3 

9 


rJ.l 

52 

66 

36 

5 

6 

6 

65 

80 

40 

5 

6 

7 


2 

21 

61 

49 

0 

19 

6 

122 

97 

79 

4 

51 

27 

R.2H 

3 

31 

39 

25 

7 

9 

12 

45 

54 

72 

9 

2 

9 


4 

43 

72 

52 

24 

27 

16 

109 

120 

80 

19 

8 

15 


L 5 

37 

51 

67 

2 

15 

6 

67 

85 

63 

3 

15 

9 


r J. l 

50 

61 

60 

0 

4 

27 

75 

139 

130 

12 

48 

30 


2 

33 

27 

49 

3 

9 

15 

46 

58 

63 

0 

6 

16 

R. 3<j 

3 

24 

39 

24 

0 

18 

9 

15 

33 

39 

4 

3 

10 


4 

18 

18 

43 

10 

7 

7 

22 

16 

19 

4 

15 

2 


L 5 

28 

1 42 

28 

12 

10 

8 

27 

19 

22 

j 7 

2 

16 

i 

rJ.l 

24 

34 

i 43 

0 

1 o 1 

0 

46 

66 

24 

7 

4 1 

1 14 


2 

24 

43 

42 

24 

7 | 

22 

40 

117 

105 

18 

28 

6 

R.4^ 

3 

21 

1 21 

51 

9 

24 

12 

30 

28 

34 

13 

19 

21 


4 | 

21 

69 

48 

3 

13 

24 

36 

64 

53 

19 

1 19 

15 


L 5 | 

76 

I 48 

42 

54 

I 57 

44 

39 

60 

78 

14 

60 

40 


r J. i 1 

31 

54 

40 

0 

18 

3 

19 

93 

36 

o! 

1 ^ 

10 


1 2 | 

34 

24 

46 

9 

12 

2 

16 i 

12 

2 

i j 

7 j 

9 

R. 5-< 

3 1 

120 

122 

120 

3 

2 

3 

33 

58 

107 

4 

12 ! 

13 


4 

109 | 

119 

120 

22 

22 

25 

25 

63 

90 

8 

7 

19 


l 5 

69 

49 

! 

60 

28 

12 

1 

28 

34 

43 

i 

30 

i 

10 

2 

12 


R. = Run. J. = Journey. 
The five runs referred to are as follows: 


Run . 1 2 3 4 5 

Furnace . 18 18 18 18 20 

Bate set . 16.2.34 23.5.34 12.6.34 1.9.34 6.12.34 

The glass is type B throughout except for R. 2, J. 4 and 5, Pot 2, which is 
type B. 






148 Gould and Hampton —Statistical Methods Applied [No. 2, 


botli as regards seed and veins for glass A. There is some indication 
that glasses B and C deteriorate somewhat, but glasses D and E 
are remarkably constant throughout the whole manufacture. As 
the colour of the various glasses mentioned increased from A to E, 
it may be that this change in quality has some relation to the colour, 
although differences in viscosity also occur in the different types, 
and whether this change is a function of the radiating powers or 
the absolute viscosity we have been unable to decide. 

Detailed figures for journeys 1-5 of 5 runs are given in Table IV. 
The data are entirely for glass B, except for the fifth run, pot 2, 
journeys 4 and 5 , where the glass is D. It is the fourth of these runs 
that has been represented in Fig. 1 . 

8 . Relation between Seed and Veins in different Runs. 

The mean value and the standard error for No. 1 pot and No. 2 
pot, respectively, have been tabulated for each run, and have been 
summarized under the number of the furnace in Table V. Sincf 
the type of glass made during the run varies, allowance had to bv 
made for that fact. This correction was applied by calculating the 
weighted average for the type of glass constituting a run, and this 
figure, together with the calculated standard error, is given in the 
table on the line marked “ calculated.” As an example of the 
method used, the following calculation was carried out for th* 
eighth run in No. 18 furnace (set December 2nd, 1933) No. 1 po 
for seed. In this run were 3 pots of type B, 2 of type C, 3 of tyj 
D and 3 of type E. The means for seed for these types for th 
whole year were: B, 51 * 0 ; C, 33 - 6 ; D, 59 - 1 ; E, 55 * 9 . The calcr 
lated mean of 51 is obtained by summing 3 x 51*0 + 2 X 33*6 an 
so on, and dividing the answer by the total number of jourin * 
The standard error of the mean was worked out on similar lines. 

From this calculated figure for the mean and standard error and 
the measured values of the same two quantities, the ratio of the 
difference between the means and the standard error of the differ- ^ 
ences has been calculated. In the majority of cases this ratio is 
found to be much less than 2 , suggesting that there is no real 
difference between the measured value for the run and what would 
be expected on the assumption of a random distribution of the total 
results obtained for the various types. In earlier years, where 
no data existed concerning the type of pot used, no other con* 
elusion could be drawn than that on occasion pots were made t' 
gave better results for veins than the average. In the last y *, 
however, the discrepancies where the quality was significantly 
different for veins can be referred to variations in the conditions of r 
manufacture or the pots. In six pots, two pairs in the years 1931- 



149 


1936 ] to the Manufacture of Spectacle Glasses. 

Table V.— Summary of Mean Values and Standard Errors for 
Different Buns together with Weighted Averages from the Whole 
Data and Significance Ratios* 


No. 1 Pot. No. 2 Pot. 


Pur- 

No. 


Set (I. 



Veins. 

No. 

l 

Seed. 


Veins. 

of 







of 







j Pots. 

M. 

E. 

, It. 

U. 

E. 

It. 

1 Pots. 

M. 

E. 

R. 

M. 

E. 

R. 

1* 

7 

77 

14-4 


33 

8*2 

1 

1 7 

75 

9*9 


31 

11*8 


1 

85 

10*4 

0*68 

21 

7*2 

| 0*88 


63 

10*4 

0*84 

21 

7*2 

0*72 

1 5 

60 

6*3 

I 

24 

4*6 


nil 







1 

65 

12 - 4 

0*36 

21 

8*o 

0*32 








1 0 

35 

4 3 

t 

24 

3*8 


5 

55 

5*5 


32 

7-7 


1 

51 

7-0 

1*86 

18 

5*1 

0*93 


63 

12*4 

0-81 

21 

8*5 

0*95 

1 9 

48 

7-4 

1 

9 

1*6 


9 

32 

4*9 


10 

1*9 


1 

58 

8-2 

1 U*90 

20 

5*8 

1*74 


64 

8*7 

1-20 

19 

6*1 

1*40 

| 3 

38 

7-0 


18 

2*2 


8 

55 

7*C 


38 

9*0 



49 

7-8 

* 1*03 

14 

6*0 

0*94 


39 

6*4 

0*40 

26 

8*4 

0*98 

, b 

4* 

3-6 


19 

4*0 


8 

GO 

11*6 


14 

2*8 


1 

lb 

6*5 

nil 

20 

6*7 

0*04 


65 

6*2 

0*76 

19 

6*2 

0*73 



61 

7-1 


7 

1*7 


9 

64 

12*8 


7 

1*3 



1 9 

65 

9*1 

0*33 

21 

C*8 

2*00 


65 

9*1 

0*06 

21 

6*8 

2*02 

1 11 

34 

4*6 


22 

3*8 


9 

65 

7*0 


33 

7*3 



51 

5*5 

0*42 

21 

6*0 

0*04 


51 

6*0 

1*52 

20 

6*3 

1*33 

1 8 

‘ 59 

6*3 


13 

2*9 


8 

65 

3*G 


23 

3*0 


1 

oo 

8*3 

0*39 

17 

4*2 

0*39 


55 

b-3 

1-10 

17 

4*2 

1-13 

1 5 

45 

5-3 


6 

1-5 


5 

53 

11-3 


7 

1*4 


1 

59 

11-0 

0*42 

19 

6*0 

2*12 


57 

11*6 

0*25 

19 

d*d 

2*12 


6 

30 

7*6 


13 

2*3 


C 

82 

7*8 


12 

2-9 




55 

9*3 

0*42 

17 

4*9 

0*74 


60 

8*7 

1*88 

18 

3*2 

1*01 


12 

57 

8*2 


12 

2*6 


12 

57 

10*0 


16 

3*7 




55 j 

6*3 

0*19 

20 

3*8 

1*79 


35 

6*3 

0*17 

20 

3*8 

0*77 


3 

60 ! 

8*4 


23 

6*4 


3 

62 

13*2 


12 

3*3 




55 

13*3 

0*32 

17 

6*9 ; 

0*88 


55 

13*3 

0*37 

17 

6*9 

0*23 


8 

47 

4*4 


27 

6*1 


8 

54 

5*3 


25 

3*3 




58 

7*7 

1*25 

18 

4*5 

1*19 


58 

7*7 

0*43 

18 

j 4*5 

1*23 


10 

44 

3*6 


47 

8*5 


1U 

49 

5*8 


32 

1 3*3 




55 

6*6 

1*47 

22 

4*7 

2*38 


55 

0*6 

0*69 

22 

| 4*7 

1*72 

19 

8 

73 

11*1 


31 

7*0 


8 

67 

8*5 


31 

6*8 




61 

9*2 

0*97 

20 

0*4 

1*16 


61 

9*2 

0*48 

20 

6*4 

1*18 


6 

67 

8*9 


18 

3*5 


6 

56 

7*4 


18 

6*5 




65 

11*2 

0*14 

21 

7*8 

0*36 


65 

11*2 

0*67 

21 

7*8 

0*30 


8 

40 

7*7 


7 

1*8 


8 

47 

3*8 


11 

3*0 




52 

8*5 

1*05 

9 

3*0 

0*57 


52 

8*5 

0*54 

9 

3*0 

0*47 


3 

81 

22*6 


6 

1*1 


2 

62 

12*0 


5 

2*1 




52 

13*4 

1*02 

10 

6*3 

0*65 


55 

16*0 

0*35 

11 

8*8 

0*66 

, 8 

48 

8*2 


3 

0*7 


8 

01 

8*3 


5 

1*2 


V 


31 

8*7 

0*24 

8 

2*5 

1*22 


51 

8*7 

0*83 

8 

2*5 

1-06 

‘.20 

8 

40 

2*8 


5 

0*9 


8 

46 

7-4 


7 

1*5 




46 

3*8 

0*94 

14 

4*3 

2*07 


46 

5*8 

nil 

14 

4*3 

1*37 


9 

51 

5*3 


21 

3*6 


10 

53 

6*3 


21 

4*2 




61 

7*4 

1-10 

19 

4*4 

0*35 


61 

7*0 

0-64 

19 

4*1 

0*33 


5 

60 

9*5 


29 

5*8 


5 

54 

5*8 


37 

5*6 




31 

8*9 

0*69 

22 

6*3 

0*81 


51 

8*9 

0*28 

22 

6*3 

1*78 


13 1 

71 

7*3 


13 

2-0 


13 

59 

4*8 


13 

1*8 



1 

59 

1 

6*1 

1*25 

19 

3*9 

1*36 


59 

6*1 

nil 

19 

3*9 

1*40 


* The second line in each case shows the calculated values. 






150 Gould and Hampton — Statistical Methods Applied [No. 2, 


1933 and two pots in the year 1934, results were given which were 
significantly different for veins than would be expected. This tends 
to confirm the deduction made earlier that veins are affected by 
the source of the pot. 

9. Variation of Refractive Index. 

In Table YI the average refractive index for each type of glass 
made during 1934 is shown, together with the standard deviation. 


Table VI. 

Mean Refractive Index and Standard Deviation for Various Types. 


Type. 

NO. Of PQtb. 

Refractive 
Index “ D ” line. 

Standard 

Deviation. 

B 1st mixture. 



29 

1*52285 

0*00036 

2nd „ . 



46 

1*52308 

0*00043 

C . 



14 

1*52296 

0*00030 

D . 



40 

1*52286 

0*00030 

E 1st mixture. 



10 

1*52397 

0*00021 

2nd . 

... 

"l 

10 

1*52283 

0*00060 


In the case of glasses B and D small changes in mixture for the 
purpose of correcting refractive index were made during the year, 
and in these cases the average is given for each of the mixtures 
used. The figures for standard deviation, therefore, represent the 
normal variation from journey to journey which takes place with 
the same mixture. For some customers a limit of -b 0*0005 is 
allowed, and this implies rejection—on the figures shown—of 18 per 
cent, of the pots made. More generally a tolerance of ± o*ooi is 
given, in which case only 1 per cent, of the pots made have to be 
rejected. When the requirements for the optical accuracy of 
spectacles is considered, a tolerance of ± 0*001 seems to be ample, 
and there is little case for insisting on narrower limits. 

The authors' acknowledgments and thanks are due to the 
Directors of Messrs. Chance Brothers and Co., Limited, for per¬ 
mission to publish this paper. 















1936 ] 


to the Manufactuie of Spectacle Glasses. 


151 


APPENDIX. 


Bata Relating to Manufactuie of Spectacle Glasses 1931. 



supp. vol. in. no. 2. G 



132 Gould and Hampton —Statistical Methods Applied [No. 4 * 


Data Relating to Manufacture of Spectacle Glasses 

1931—(continued). 


Dlt< Nt 

I uniat 1 

Jounin 

No 1 Pot 

No 2 Pot. 


1 


w ced. 

Toms. 

Type 

Pud 

Terns 

21.9.31 

18 

1 

A 

95 

28 

A 

120 

29 

open 

4 

C 

33 

11 

C 

67 

25 


5 

C 

40 

8 

G 

42 4 

r 13 


6 

c 

42 

18 

C 

42 

11 


7 




D 

108 

9 

| 

8 

D 

42 

8 

D 

29 

11 


9 

D 

38 

25 

D 

40 

14 , 


1 10 

E 

38 

38 

E 

45 

11 

12.10.31 

19 

1 

A 

76 

9 

A 

59 

3 

co\ered 

2 

A 

44 

17 

A 

45 

9 


3 

A 

44 

16 

A 

41 

4 


, 4 

A 

51 

19 

A 

45 

18 



A 

84 

36 

A 

50 

ts 


6 

A 

1 101 j 

12 

A 

94 

48 

i 

22.11 31 

18 

1 

A 

116 

_ 

A 

! 27 

5 

to\ered 

2 

A 

63 

— 

A 

! 45 

2 



3 

A 








4 

A 

48 

12 

A 

40 

$ - 



0 

A 

56 

13 

A 

160 




6 

A 

74 

5 

A 

84 

5 


7 

A 

52 

8 

A 

77 

13 

1 

S 

A 

43 

5 

A 

35 

4 



9 

A 

47 

4 

A 

63 

9 


1 10 

A 

50 

1 

4 

A 

45 

8 t 

J 





1936 ] to the Manufacture of Spectacle Glasses . 153 

Data Relating to Manufacture of Spectacle Glasses 1933. 


Xo. 1 Pot. i Xo. 2 Pot. 


TDut "-tt ruraace. 

Jouinpy 

Type 

beed 

Terns 

Type 

beed 

Terns 

8.2.33 

19 1 

1 1 

B 

17 

2 

B 

40 

4 

open | 

2 

B 

29 

2 , 

B 

41 1 

4 


3 

B . 

38 

4 

B 

41 

3 


4 

B 

46 | 

11 1 

B 

37 

25 


5 

B 1 

23 

18 

B 

50 

25 


6 

B [ 

44 

4 

B 

65 

12 


7 

B ! 

28 

8 

B 

38 

9 


8 

D i 

92 

4 

D 

65 

5 

16.4.33 

19 

1 

B 

34 

9 1 

B 

45 1 

8 

open 1 1 

2 1 

B 

130 

5 | 




1 

3 j 

I) 

78 

5 

D 

79 

2 

14.8.33 

Optical 

1 ! 

B 

66 

8 i 

B 

; 40 

2 

open 


B i 

32 

3 1 

B 1 

102 

9 


3 : 

B 1 

34 

7 

B 

56 

13 



4 1 

B 1 

30 

5 

B 

i 39 

6 



5 

D 

51 

8 

D 

, 30 

5 


1 6 


76 

2 

D 

97 

1 


1 7 , 

D 1 

63 

16 

D 

1 83 

7 

| 

8 

D 

74 

6 

D 

61 

I 1 

! 

9 

E 

66 

63 

E 

56 

i 26 



10 

E 

69 

126 

E 

40 

! 74 



11 

E 

57 

107 

E 

64 

72 

13.10.33 

19 

1 

B 

25 

2 

B 

56 

12 

open 


2 

B 

37 

8 

B 

83 

3 


3 

B 

46 

3 

B 

65 

5 



4 

B 

I 38 

5 

B 

67 

8 



5 ! 

B 

1 106 

6 

B 

108 

2 



6 

B 

, 34 

3 

B 

39 

1 



7 

B 

1 44 

5 

B 

41 

6 



8 

B 

55 

1 6 

B 

33 

3 

10.1133 

20 

1 

c ! 

47 

3 

C 

29 

6 

open 

j 2 

C i 

1 37 

I 3 

C 

20 

14 

! 3 

C i 

I 26 

10 

G 

26 

9 


1 1 4 

c 

36 

7 

C 

27 

7 


3 

D 

33 

— 

D 

80 

2 



6 

D 

40 

4 ! 

D 

69 

1 



7 

D 

51 

6 

D 

58 

10 



8 

D 

48 

3 

I) 

58 

11 

2.12.33 

18 

1 

B 

81 

18 

B 

| 95 

12 

open 


2 

B 

46 

33 

B 

56 

24 



3 

B 

45 

11 






4 

0 

31 

7 

C 

25 

14 



5 

C 

41 

13 

C 

55 

24 



6 

D 

71 

12 

D 

67 

14 



7 

D 

55 

15 

D 

68 

13 



8 

D 

77 

35 

D 

96 

33 



9 

E 

48 

48 

E 

67 

47 



10 

E 

58 

33 

E 

69 

28 

1 

11 

E i 

45 

15 








154 Gould and Hampton— Statistical Methods Applied [No. 2 


Bata Relating to Manufacture of Spectacle Glasses 1934. 


Date feet. 

c! 

i 

3 

5 

1 

*■3 


No. 1 Pot. 


o 

►d 

o 

Type. 

freed. 

Veins. 

Refiactn e 
Index. 

Type 

Seed. 

Veins. 

Refractive 

Index. 

16.2.34 

18 

1 

B 

68 

5 

1-5222 

B 

67 

24 

1-5223 



2 

B 

79 

32 

1*5227 

B 

69 

20 

1-5226 



3 

B 

49 

11 

1-5232 

B 

55 

28 

1-5228 



4 

B 

86 

20 

1-5230 

B 

86 

25 

15232 



5 

B 

67 

6 

1-5230 

B 

58 

11 

1-5228 



6 

B 

42 

11 

1-5227 

B 

51 

25 

1-5231 



7 

B 

28 

19 


E 

72 

12 

1-5230 

i 

8 

E 

51 

15 



59 

39 


10.4.34 

18 

1 

D 

45 

4 

1-5230 

E 

95 

3 

1-5241 


2 

D 

33 

2 

1-5230 

E 

67 

8 

1-5237 


3 

E 

32 

10 

1-5237 

E 

33 

12 

1-5242 



4 

E 

64 

7 


E 

44 

8 




5 

E 

51 

10 


E 

25 

5 


23.5.34 

18 

1 

B 

31 

6 

1-5230 

B 

62 

6 

1-5228 



2 

B 

44 

12 

1-5228 

B 

99 

27 

1*5231 



3 

B 

32 

9 

1*5228 

B 

57 

7 

1-5231 



4 

B 

56 

22 

1-5239 

D 

103 

14 

1-5231 



5 

B 

52 

8 

1-5219 

D 

72 

9 

1-5227 



6 

B 

86 

9 

1-5224 

D 

100 

10 

1-5228 

12.6.34 

18 

1 

B 

57 

10 

1-5230 

B 

115 

30 

1-5231 

i 

2 

B 

36 

9 

1-5228 

B 

56 

7 

1-5229 



3 

B 

29 

9 

1*5226 

B 

29 

6 

1-5228 



4 

B 

26 

8 

1-5225 

B 

19 

7 

1-5228 



5 

B 

33 

10 

1-5226 

B 

23 

8 

1-5229 



6 

B 

! 61 

10 

1-5228 

B 

47 

! 4 

1-5229 



7 

0 

43 

8 

1-5230 

C 

53 

11 

1-5233 



8 

C 

! 29 

6 

1-5232 

C 

13 

48 

1-5226 



9 

D 

95 

16 

1-5225 

D 

73 

8 

1*5228 



10 

D 

113 

6 

1-5227 

D 

83 

9 

1-5228 



11 

E 

81 

13 

1*5238 

E 

123 

24 

1-5237 



12 | 

! E 

1 86 

40 

1-5240 j 

E 

45 

25 

1*5241 

20.7.34 

20 

1 

D 

1 74 

13 

1-5224 

D 

51 

5 

1-5230 



2 

D 

I 42 

8 

1-5226 

D 

56 

6 

1-5229 



3 

D 

53 

12 

1-5234 

D 

39 

12 

1-5232 



4 

E 

32 

25 

1-5242 

E 

99 

18 

1-5242 



1 5 

E 

24 | 

16 

1-5238 

E 

32 

43 

1-5235 



! 6 

E 

45 

33 

1*5235 

E 

28 

16 

1-5233 



1 7 

D 

57 

40 

1-5234 

D 

53 

35 

1-5233 



1 8 

D 

| 72 

9 

1-5232 

D 

76 

24 

1-5229 



1 9 

D 

| 60 

29 

1*5230 

D 

56 

41 

1-5231 



10 




i 

D 

63 

32 


26.7.34 

18 

1 

1 B 

1 80 

i 13 

1-5235 

B 

93 

14 

1-5235 



2 

1 B 

, 46 

40 

1-5224 

B 

39 

29 

1-5242 



3 

B 

1 

| 54 

23 

1-5231 

B 

53 

21 

1*5234 





1936 ] 


to the Manufacture of Spectacle Glasses . 


155 


Data Relating to Manufacture of Spectacle Glasses 

1934—(continued). 


Date Set. 

t 

CJ 

es 

| 

.Tourney. 


No 

1 Pot. 



No 

2 Pot. 


1 

Type. 

Setd. 

Veins 1 Refraetne 
v Index. 

Type. ^ 

fceed. [ 

i 

Veins 

Refractive 

Index. 

23.8.34 

20 

1 

B 

90 1 

10 

1-5233 

B 

52 

19 

1-5233 

* 


2 

B 

29 

17 

1-5231 

B 

35 | 

58 

1-5233 



3 

B 

77 

36 

1-5238 

B 

75 1 

34 

1-5234 



4 

C 

51 

35 

1-5232 

c t 

57 

40 

1-5232 



5 

C 

55 

47 

.... i 

1-5233 

c 

51 

36 

1-5232 

1.9.34 

1* 

1 

B 1 

34 

0 

1-5231 

B 1 

45 1 

8 

1-5233 



2 

B , 

38 

18 

1-5230 

B , 

87 

17 

1 5231 



3 

B 

31 

15 

1-5232 

B ! 

31 

18 

1-5237 



4 

B 

46 

13 

i 1-5232 

B 1 

31 1 

18 

1-5235 



5 

B 

69 

52 

! 1 5234 

B 

59 

38 

1-5233 



6 

D 

42 

27 1 

1-5231 

D 

48 

33 

1-5233 



7 

D 

53 

41 

1-5228 

D 

56 

33 

1-5228 



8 

D ! 

! 62 

48 , 

. 1-5230 

B 1 

55 

32 | 

1 1*5232 

14.10.34 

18 

1 

B 

48 

18 

1-5235 

B 

84 1 

11 1 

1*5235 



2 

B 

53 

64 

1-5233 

B 

57 

42 

1*5234 



3 

B 

44 

mas. 

1-5232 

B 

33 

20 

1*5232 



4 

B 

35 

19 

1-5230 

B 

52 

24 

1-5234 



5 

B 

40 

22 

1-5228 

B 

43 

24 

1-5229 



6 

C 

53 

10 

1-5225 

C 

22 

19 

1*5231 



7 

C 

20 

40 

1-5237 

C 

51 

21 

1-5223 



8 

C 

n.s. 

veiny 

1-5227 

C 

34 

42 

1-5229 

• 


9 

D 

51 

73 

1-5228 

D 

not s orted 

1-5229 



10 

D 

56 

67 

1-5227 

D 

61 

37 

1-5227 

6.12.34 

20 : 

1 

B 

42 

7 

1-5229 

B 

49 

5 

1-5228 

• No. 1 


2 

B 

35 

8 

1-5224 

B 

38 

6 

1-5223 

pot lined 


3 

B 

120 

3 

1-5223 

B 

66 

10 

1-5225 



4 

B 

116 

23 

1-5224 

B 

59 

21 

1-5224 



5 

B 

59 

23 

1-5230 

B 

36 

8 

1-5226 



6 

B 

68 

11 

1-5224 

B 

54 

7 

1-5227 

* 


7 

D 

95 

5 

1-5224 

D 

80 

21 

1-5225 



8 

D 

99 

10 

1-5224 

D 

72 

15 

1-5225 



9 

D 

69 

15 

1-5227 

D 

101 

22 

1-5225 



10 

D 

69 

4 

1-5227 

D 

68 

13 

1-5228 



11 

E 

48 

17 

1-5226 

E 

53 

20 

1-5223 



12 

E 

49 

18 

1-5227 

E 

42 

17 

1-5224 



13 

E 

53 

25 

1-5221 

E 

52 

13 

1*5221 




156 


Discussion 


[No. 2, 


Discussion on Mr. Gould’s and Dr. Hampton’s Paper. 

Professor Pearson : When reviewing its position last summer, 
the Committee of this Section felt that rather few of the papers so 
far presented at its meetings had been of a kind to catch the interest 
of the practical man in industry, who had little or no knowledge of 
statistical technique and terminology. We therefore decided to 
arrange some discussions that would centre round actual problems 
presented by industrialists which had arisen in the course of the 
production work with which they had been concerned. This paper, 
given us to-day by Dr. Hampton and Mr. Gould, is precisely of the 
kind which we wanted, and on behalf of all those who are interested 
in the development of this field of applied statistics I would like to 
convey to the authors a few words of appreciation for what they 
have done. Let me assure them that we statisticians have all of us a 
very genuine feeling of admiration for those who, in the press of 
a hundred and one other problems that arise in their everyday busi¬ 
ness, nevertheless find time to persist in the study and application 
of unfamiliar scientific tools. Can we, I wonder, on our side do 
something to make the understanding of those tools easier ? 

The paper that Messrs. Dudding and Jennett read before this 
Section in December was dealing mainly with the results of 
experimental tests carried out under the controlled conditions of 
a research laboratory. The interest of the present paper lies in the 
fact that it is dealing with a problem which has arisen under produc¬ 
tion conditions where it is quite impossible to plan the collection 
of data in a way which would make statistical analysis most easy. 
The pots, for example, must be filled with the glass that is needed 
at the moment, and the cost of the completion of a carefully balanced 
experiment using glass which is not wanted would be prohibitive. 
The authors have therefore had to attack data which are extremely 
complex, and have calculated what might be termed certain overall 
means, standard directions and correlation coefficients some of which, 
as they have frankly admitted, are difficult to interpret. 

While I believe that even the more intractable portions of their 
material could tell us something more if submitted to recently 
developed methods of analysis, I shall purposely confine my remarks 
to one part of the data from which a set of very nicely balanced 
observations can be taken. In doing this I may be criticized for 
avoiding difficulties, but I am certain that there is first so much 
explanatory work to be done in making clear the statistician’s method 
of dealing with the simpler problems, that it is better not to attempt 
too much at once. 

The data chosen have been extracted from Dr. Hampton's 1934 
records and are contained in his Table IY. They are exactly 
balanced in the sense that both for seed and veins 150 observations 
are available, representing tests on samples from three cylinders 



157 


1936] on Mr. Gould's and Dr. Hampton's Paper . 

taken from each of two pots on the first five journeys of five different 
runs.* 

'Fig. 1, from p. 141 above, represents the observations in the 
fourth of these five runs. There is clearly seen to be very consider¬ 
able variation among the individual measures of both seed and 
veins : our objective must be to examine whether, and to what 
extent, parts of this total variation can be associated with certain 
stages or factors in the production process. The procedure of 
Analysis of Variance consists primarily, as its author R. A. Fisher 
has pointed out, in an arrangement of the data in a form most help*- 
ful in this process of examination. Such an arrangement is shown 
in the analysis of variance tables (Table VII) below, and I have 


ILLUSTRATION OF ANALYSIS INTO PARTS. 


GLASS B SET 

l* 3 ' 341 

JOURNEY! 

z 

3 

* 

5. 


SEEP 

70 

60 

P0T1 50 

*0 

30 

20 





• 

• 


T 


* 

• 


: 

i t 

|KEAN| 



* 


xr^. 



Hi mm 


m j..-j 

0 

i 

J. .i-s 

i • 

• 


m 

SEED 

110 

100 

30 

POT 2 g 0 

70 

60 

50 

*0 

30 


• 

* 

• 


• 

\ 

• 

SEEP 

POT > 

_eMP 

T W. 




. 

p£=*=y 



: 

1 

• 

• 

1 

_i_ 

U 

■ 

MEAN 


prepared Fig. 3 to help in making clear the connection between the 
sums of squares given in these tables and the plotted points of Fig. 1. 

In the first place, we shall suppose that if x represents, say, an 
observation of seed from a particular cylinder, then this quantity 
may be regarded as built up of a number of additive parts, thus 

x :=== oc -I- |3 y -f - 3 -j-- if . . . . ~f- u. 

Here a is the grand mean for the glass used, (3 a term common 
to all samples of glass from a given run represents the amount by 
which the run mean differs from the grand mean of the whole, y a 

* The whole of the analysis was carried out, under the direction of Mr. 
B. L. Welch, by students in the Department of Applied Statistics at University 
College, London. In choosing the data it was believed that all the tests were 
for glass B; afterwards it was realized that in two journeys of one run, one 
of the pots contained glass D, but we do not think that this slip vitiates the 
results obtained, as the particular observations of D are quite typical of B. 










158 


Discussion 


[No. 2, 


similar term common to all samples from a particular pot, but 
varying from pot to pot, 8 a term common to all three cylinders 
of a journey, and so on. Finally u is a random residual which is 
assumed to follow the normal law of variation. Fig. 3 shows how 
estimates of these contributory parts are obtained by calculating 
appropriate mean values from the seed data of Fig. 1. 

In this diagram the black circles and solid horizontal lines are 
precisely those of the upper half of Fig. 1, representing the individual 
cylinder counts and the means of each set of three. The other lines 
and bars in the diagram may be imagined superposed in turn as 
follows. At each stage we subtract a, p, y, etc., from x until we 
are finally left with the residual, u. 

(a) First we insert the grand mean for the 150 observations 
(5 runs X 2 pots X 5 journeys X 3 cylinders) which is our 
estimate of a, and is represented by the dotted lines running 
across the 2 charts at 56-09. 

(b) The mean seed-count for the run represented (run 4 ) 
is below the grand mean ( i.e . at 47*8), and is shown by another 
dotted line. The height of the black block represents our 
estimate of (3 for this run. The sum of the squares of these 5 
black contributions, one for each run, multiplied by 30 (since 
the 30 observations in the run have the same contribution ( 3 ) 
is the figure 14,059 in the first line of Table VII, 

(c) Pot 1, with a mean of 40-9, had a seed-count below, 
and .pot 2, with 54-7, a seed-count an equal distance above 
the average for the run. Dotted lines show these two levels, 
and hatched blocks represent the estimates of the pot-contribu¬ 
tions, ± y- The sums of squares of these 5 pairs of pot-contribu¬ 
tions multiplied by 15 (since the 15 samples from a pot have the 
same contribution y) are shown in the second line of Table VII. 

(d) There are now contributions peculiar to each pot- 
journey; these are shown in the figure by the cross-hatched 
blocks and represent estimates of the 8's. There are 10 different 
blocks for each run, and the sum of all these squares multiplied 
by 3 (since the 3 samples from each pot-journey have the same 
contribution S) is 64.081, or the total of the third, fourth and 
fifth lines of Table VII. Though, for simplicity, this has not 
been shown in the diagram, the cross-hatched contributions may 
be usefully further analysed. 

We may ask, (i) How far is there a systematic journey 
trend in quality common to all runs? Does the seed-count, 
for example, during the first 5 journeys of all runs under con¬ 
sideration increase with the age of the pot ? (ii) Is there evidence 
of a journey trend common to the two pots side by side in the 
furnace, but not common to different runs? (iii) Besides one 
or both of these effects, are there significant fluctuations 
peculiar to individual journeys of one pot, i.e. only recorded 
for the 3 cylinders sampled from a particular filling of a pot ? 
The* sums of squares associated with these possible contribu¬ 
tions are shown in lines 3, 4 and 5 respectively of Table VII. 



1936 ] on Mr. Gould's and Dr. Hampton's Paper. 159 

(e) We may now turn to differences between the 3 cylinders 
sampled from the same pot. Are these systematic ? For the 
run shown in Fig. 3 the third cylinder shows the best quality, 
the tenth the worst and the sixteenth an intermediate value 
(lying nearest to the tenth); the average seed-counts are 
35*7, 55*6 and 52*0 respectively. This effect, in the form of 
contributions subtracted from or added to the pot-journey 
means, is shown by the dotted bars in the diagram. If these 
lengths (3 different ones for each run repeated 10 times within 
the run) are squared and added, we obtain the total of the sums 
of squares in lines 6, 7 and 8 of Table YII, i.e. 22,257. 
Far the greatest part of this contribution comes from the 
cylinder differences in the first run (line 6). The trend in the 
other four runs is very similar to that shown in the diagram 
(for run 4), and gives a common part (line 7) and parts associ¬ 
ated with the four runs separately (line 8), the latter not being 
significant. 

(/) Finally we are left with the residuals, or differences 
between the observation circles and the end of the dotted bars. 
The sum of the squares of these (150 in all) is shown in line 9 
of Table YII, amounting to 27,227. 

The second part of Table YII contains results from a similar 
analysis of the observations for veins, while the third shows the 
covariance or sums of products of corresponding contributions for 
seed and veins. 

A first question which will naturally be asked is, how far this 
breaking up into parts is anything but artificial. Clearly a construc¬ 
tion of this kind could be forced on any set of completely random 
data merely by taking the means of appropriate groups. Here it 
can be answered that the statistical method of arrangement has 
been such that if there were no consistent differences in quality 
associated with different runs, pots, journeys and cylinders, then 
the figures in lines 1-8 of Table VII in “ the mean square ” column 
would only differ by chance fluctuations from the residual mean 
square of fine 9. Since on reference to appropriate statistical tables 
the former are in almost all cases significantly larger than the 
latter, there is clear evidence pointing to some assignable causes of 
variation. 

It is from a study of this or similar tables in which the sums of 
squares are related, in the manner described, to the 44 parts ” in Fig. 
3 , that the statistician feels his way in interpreting rather complex 
data. To attempt any detailed interpretation here would occupy 
far too much time, but a few brief comments may help to complete 
the picture I have tried to give you of the statistical approach. 

Residual. 

After the various contributions referred to have been subtracted 
there is still remaining a very considerable residual variation. For 
seed the total variance of 971, giving a standard deviation of 31, 
has been reduced to a residual variance of 303, or standard deviation 

g 2 



160 


Discussion 


[No. 2, 


17. Part of this last is, no doubt, due to fluctuations arising in the 
sampling of the cylinders, since presumably the count recorded 
was made on only a portion of the glass, sampled from the split and 
flattened cylinder. One might expect (on the basis of the Poisson law 
which appiies in similar counting problems) that if m were the average 
number of seed per unit area, and x the number actually counted 
in a single unit of area, then the sampling variance of x due to this 
cause would be about equal to w. For these observations m lies 
between 50 and 60, i.e. is far less than the residual variance of 303. 
Part of the large difference may perhaps be due to the variation 
in the manipulative skill of the operative in removing the glass 
from the furnace and in handling the cylinder. 

For veins, the residual variance (95) represents a larger proportion 
of the total variance (169) than for seed, or, in other words, the analysis 
has there been less successful in associating parts of the total varia¬ 
tion with pot, journey, cylinder factors, etc. 

Cylinders . 

A marked result for seed which the more detailed statistical 
analysis has brought out and which was masked in the overall 
averages of Gould's and Hampton's Table III, is the difference 
between the cylinder trend in the first run and the trends in the other 
four runs, which are very much alike. It looks as though in run 1 
something exceptional happened, causing the sixteen cylinders in all 
journeys to have high seed values; possibly some assignable cause 
for this form of variation can be suggested. The cylinder trends 
for runs 2-5, represented by the average 35-7 (third cylinder), 
55-6 (tenth cylinder), 52*0 (sixteenth cylinder) is definitely significant 
for seed, as shown by a comparison of lines 7 and 9 of Table VII. 
It is also consistent for the four runs, as shown by the fact that the 
interaction term of line 8 is not significant. The trend for veins is 
consistent but much less marked. Without data from the inter¬ 
mediate cylinders, we are necessarily rather in the dark as to the 
general trend in quality as the pots are emptied. 

Journeys . 

An important part of the total variation is represented by the 
cross-hatched blocks in the diagram (lines 3, 4, and 5 in Table VII). 
This arises from differences between the quality of glass from different 
journeys, after allowance has been made for any differences between 
whole runs and between pots. 

(i) A part of these journey differences may be attributed to 
changes which are repeated in all the runs—in other words, t to some 
systematic trend in quality from journeys 1 to 5. For the 150 
observations the mean seed-counts for journeys are No. 1, 58-9; 
No. 2, 55*7; No. 3, 50-0; No. 4, 64*8; No. 5, 51*3. This effect is not, 
however, very marked, and there is not sufficient evidence to con¬ 
clude that the same trend would be repeated in other runs, e.g. might 
be regarded as a consistent trend associated with ageing of a pot, 
(These remarks apply, of course, only to glass B and journeys 1-5.) 



1936 ] 


on Mr. Gould's and Dr. Hampton's Paper. 


161 


(ii) A much larger part of these journey differences (line 4, 
Table VII) is represented by changes which are shared by the glass 
in both pots in a furnace, but are not repeated from one run to another. 
Thus in Fig. 3 (representing the run of September 1st, 1934 ) the glass 
from journey 1 has a seed-count which in both pots is below the 
average; but in the run of June 12 th, 1931 (Table IT) it is found 
that the glass from journey 1 is in both pots above the average. 

Similar journey effects common to glass from both pots used in 
a run, but not repeated from run to run, are found for veins. These 
effects in seed and veins are also correlated, as Mr. Welch points out 
below. 

(iii) It cannot, however, be said that the part of the variation 
compaon to the 3 cylinders can be represented completely by adding 
to the run mean a term appropriate to the pot plus a term appro¬ 
priate to the journey. This is shown in Table VII by the existence 
01 an interactionfor pots and journeys (line 5), which, although 
it is only just significant (in comparison with line 9), is almost 
certainly real, because there is a positive correlation between these 
parts for seed and veins. 

Table VII. 


Source of Vaiiation. 1 

1 

i 

1 

Degrees 1 

( 1 ) Seed. 

(2) Veins. 

(3) Se ■*d and 
Veras. 

ot 

Free¬ 

dom. 

Sum of j 
Squares. 

Mean 

Square. 

1 

Sum of 
>quaret>. 

1 

| Mean 
Squaie. 

Sum of 
Pro¬ 
ducts. 

Corre¬ 
lation 
Coeffi- 
c lents. 

1. Between runs.1 

4 

14,039 

3,31a 

2,409 

617 

80 i 

0-1365 

2 . Between pots (within 








runs). 1 

5 

17,031 

3,406 

3S0 

116 

1,050 

0*5252 

3. Between journevs(com- , 


j 






mon to all runs) 

4 


1,089 

1,150 

288 



4. Between 30111 neys (with-1 






9 257 

(P1380 

in runs, less common , 








part above). 1 

10 

43,934 

2,871 

7,710 

482 

J 


3. Interaction (pots and 








journeys within runs) 

20 

13,792 

090 

3,012 

151 

2,294 

0-3560 

0 . Between cylinders in j 









o 

13,826 

7,913 

704 

352 

^ ! 


7. Betw pen cylinders (i oni- 








mon to runs 2-5) 

2 

| *,711 

2,333 

792 

390 

3,217 ! 

0-3108 

8 . Interaction (cylinders 

i 





1 


and runs 2-3) 

G 

1,720 

287 

280 

48 

J 


9. Itesulual . 1 

90 

27,227 

1 303 

\313 

93 

831 

0-0340 

10. Total . 1 

119 

144,033 

1 971 

25,210 

109 

18,033 

0-2989 


Softs .—The figures in the column headed ** mean square ” are obtained by dividing the 
“ sum of -squares ” by the “ degrees of freedom.’* The “ correlation coefficients ” are obtained 
by dividing the “ sura of products ” by the geometric mean of the appropriate “ sum of squares.” 
The *' interaction,” if significant, represents a differential effect. Thus the contribution to the 
sum of squares in line 3 represents that part of the variation common to the 3 cylinders which 
cannot be represented by adding to the run mean a term appropriate to the pot plus a term 
appropriate to the journey. Its significance would be tested by a comparison with the residual 
sum of squares of line 9. 

The meaning of this term can be illustrated as follows from the 
figure. While the cross-hatched blocks for journeys i, 3, 4 and 5 
show a general correspondence in the two pots, for journey 2 there is 










162 


Discussion 


[No. 2, 


no agreement; something exceptional happened in pot 2. This 
differential effect occurs for other runs and journeys, but it is not so 
important as the journey effects common to both pots referred to 
in (ii) above. Without samples from more than 3 cylinders it is 
difficult to be sure of the interpretation of this. 

Concluding these remarks regarding journey differences, one may 
say that: (i) any common trend such as might result from an ageing 
of the pot is very small; (ii) within runs there are very marked 
differences common to the two pots, both for seed and veins; these 
differences are correlated in seed and veins. Their origin would 
appear to lie in the furnace conditions unless there is any possibility 
of change in quality of the constituents of the glass from one journey 
to another; (iii) finally, there is evidence of some special factors 
sometimes influencing the quality of all cylinders from one pot, 
but not those from the other. When such factors come into play, 
they seem to effect both the seed and veins in the glass from the pot. 
Without records for more cylinders it may be difficult to interpret 
this. 

It must, of course, be remembered that these remarks apply 
only to glass B and to the early journeys. In the case of glass A 
the evidence for a common trend with the ageing of the pot in 
all runs is much stronger. 

Mr. Welch : I should like to discuss the question of correlation 
between seed and veins which the authors raise in paragraph 5 of 
their paper. They give there a table of correlations for the different 
types of glass for two periods : (a) 1931-33 and (b) 1934 . The 
interpretation of these coefficients is made more difficult owing to 
the data upon which they are based not being homogeneous. As 
has been shown by the authors and as is brought out clearly in the 
analysis of variance tables to which Professor Pearson has referred, 
it is possible to assign parts of the variability to a number of specific 
factors—runs, pots, journeys, etc. It follows that, in considering 
the significance of a correlation coefficient, we should not take the 
effective size of the sample to be the total number of observations 
contributing to it, but, perhaps, the total number of runs or pots 
or journeys—in general, something very much less than the total 
number of observations. Some light on the importance of the 
different factors in causing correlation may be obtained by carrying 
through on the data for seed and veins an analysis of covariance 
analogous to the analyses of variance which one performs on the 
variables separately. The results of doing this for the selection of 
data in Table IY are given in the last two columns of Table VII. 

The total correlation coefficient from the 150 observations is 
0*2989 (given in line 10 of the table). This coefficient would be 
adjudged significantly different from zero if the material were 
homogeneous, which, however, is not the case. The relation between 
the partial correlation coefficients given in the other lines of Table 
VII and the observations is illustrated in Fig, 4 . Here a horizontal 
vein scale and a vertical seed scale are shown. The point A (v = 
14*05, s = 56*08) represents the grand mean of the 150 cylinders, 



1936] 


on Mr. Gould's and D>. Hamptons Paper. 


163 


while G- (v = 13, s — 30) represents a single cylinder (viz. the third 
cylinder from third journey of second pot in fourth run). The 
parts into which the seed-count can be split (as described by Professor 
Pearson in Fig. 3 ) are shown on the left of the diagram, and similar 
contributions to the vein observation are shown at the top. The 


ANALYSIS OF CORRELATION. 



vectors AB, BO, CD, etc., have these marginal contributions as the 
components parallel to the two axes. 

In the analysis of covariance we consider the correlation between 
corresponding parts for seed and veins. If, for example, when the 
pot mean for veins is above the mean for the run, the pot mean 
for seed also tends to be so, i.e . if there is + ve correlation, the 
vectors BO will tend to be in the diagonal direction shown for the 
observation of the diagram. If there is negative correlation, the 


164 


Discussion 


[No. 2, 


vectors will tend to lie in a diagonal direction at right angles to this. 
If we take any particular type of contribution we can draw out 
from a fixed origin the line representing it: doing this for all the 
observations, we obtain a scattered diagram which connects up with 
the correlation coefficient listed in Table VII against the type of 
contribution under consideration. 

Thus, taking the 150 residuals, represented by vectors such as 
EG, if they are plotted about a common origin they are seen to lie 
haphazardly round it, and there is no evidence of correlation. This 
agrees with the non-significant coefficient of 0*0546 which is recorded 
in line 9 of the table. The factors affecting the residuals are : (a) 
the departure of individual cylinders from the general cylinder 
trend of the run, and (b) the error of sampling the cylinders. To¬ 
wards (a) will contribute the variability due to the working of the 
glass from the time when it is withdrawn from the pot until it is in 
the form of a completed cylinder. There is thus no evidence, from 
the correlation observed, that seed and vein are introduced simul¬ 
taneously in working the glass, nor that they are related to one 
another in their distribution over the finished sheets. 

We may now consider the journey effects. For any particular 
pot-journey we may consider the journey effect as made up of two 
parts—one part (represented by vector CD) common to both pots 
in the furnace and another part not common (vector DE). The part 
not common is what is termed a differential journey effect. In 
line 5 of Table YII it is listed as “ interaction ' 5 between pots and 
journeys and the corresponding correlation is 0*3560, which is 
approaching significance for 20 degrees of freedom. These inter¬ 
actions may be due : (a) to conditions being slightly different in 
the two positions which the pots occupy in the furnace (not a system¬ 
atic difference throughout the whole run, however, as this will be 
included in 44 between pots,'’ line 2), (b) to differences in conditions 
of filling the two pots (again not systematic throughout the run). 
Further, since each pot-journey is the sum of three separate cylinder 
values, the residual variation enters to a certain extent into the 
interactions. As these residuals seem to be uncorrelated, and as the 
effect of the presence of uncorrelated parts is in general to mask 
the effect of true correlations present, we may attach more signifi¬ 
cance to the correlation of line 5 than we should if correlation had 
already been demonstrated in line 9. 

The journey contribution common to pots (lines 3 and 4 together, 
Table YII) will be determined by such factors as : (a) the condition 
of the furnace as a whole for the particular journey, (b) the fluctua¬ 
tion in the glass mixture, and also to a certain extent by the inter¬ 
action and residual effects. These latter, however, are relatively 
unimportant, and if, as the authors state, the control of the mixture 
is good the correlation 0*4386 of lines 3 and 4 can probably be 
ascribed to fluctuations in the furnace conditions affecting vein 
and seed similarly. 0 

The other three correlations cannot be interpreted owing to the 
small number of degrees of freedom available for each. In any 
case, one would not attempt to treat the 3 cylinder values as samples 



1936 ] 


on Mr. Gould's and Dr. Hampton s Paper. 


165 


from normal populations, since cylinder differences are due largely 
to a definite trend likely to be repeated in future experience. One 
may note that this trend is in the same direction for seed and vein 
and that the same would probably be the case if cylinders other 
than the third, tenth and sixteenth were sampled. 

It is useless to base any conclusions as to correlation on the 
behaviour of the 5 independent “ between pot 15 contributions— 
for we cannot say how likely it is that effects, observed for the 5 
particular pairs of pots, will be repeated for other pairs of pots 
in future. Effectively we have only a sample of 5 from which to 
reason. The same applies to run comparisons. 

To summarize, we may say that there is no indication of correla¬ 
tion due to factors causing departure from cylinder trend or to 
sampling the glass, that conditions of the furnace and filling the 
pots do seem to influence seed and vein similarly to some extent, 
and that the cylinder trends for seed and veins (as far as it is possible 
to judge from the three cylinders sampled) are in the same direction. 
It is to be emphasized, of course, that only glass B has been con¬ 
sidered here. 

In conclusion, I should like to add my thanks to those already 
expressed to Mr. Gould and Dr. Hampton for their interesting 
paper, and for the data they have put at our disposal. 

Dr. Gooding said he would like to preface his remarks by 
congratulating the authors on the paper which had been read that 
afternoon. As he was engaged in the glass industry, although in 
an entirely different type of manufacture from that dealt with by 
the authors, the publication of this paper had been of considerable 
interest to him and had provided many hours 1 entertainment. 

He had been fortunate in having had an advance copy of the 
paper, and had been able to consider it in some detail, not, however, 
from the point of view of the rigid mathematical approach of the 
statistician. 

Although he was going to raise a number of controversial points, 
he would do so largely to discover what the statisticians' point of 
view would be with respect to them. In other words, he wished to 
obtain—and he was sure that all would share the wish—the maxi¬ 
mum amount of benefit from the data presented. 

At the outset he would like to ask Dr. Hampton one question. 
It had been mentioned that 18 to 20 cylinders were made from one 
pot of glass; it was not quite clear whether one pot per day was 
meant, or whether the whole pot was worked out in 20 cylinders. 

Dr. Hampton explained that a pot was worked down as far as 
it was considered expedient to do so, and actually that meant 16 
cylinders out of the pot in about 6 hours. 

Dr. Gooding considered that point to be interesting, because 
he had noticed, in graphing the results, that there were certain 
irregularities in the curves, and he wondered whether they would 
be correlated with the refilling of the pots and further melting. 



166 


Discussion 


[No. 2. 


If the results in the authors' Table I, p. 144 , were plotted on a fairly 
open scale, using as co-ordinates the number of the journey and the 
mean quantity of seed or veins, it would appear that as the journey 
number increased the mean number of seed or veins also increased, 
particularly for glasses A and D. Glass B did not behave in the 
same manner. This would seem to indicate the existence of some 
relationship between the journey number and number of seeds or 
veins for glasses A and D. The effect so observed for glass A was 
not so clear in the authors’ Big. 2 , because: (a) as the journey 
number increased the standard error increased; (6) there was an 
initial fall in the seed-count before the increase occurred. The same 
applied in the case of glass D, but not so markedly. 

Considering the data for veins for the same glasses, the general 
tendency for glasses B and A to become worse with time was more, 
noticeable. 

In Section 6 of the paper, it was stated that there was no signifi¬ 
cant difference between the quality of any one journey and the means 
of all the journeys. This conclusion had been based on the assump¬ 
tion that the results obtained were a random sample of a much 
larger number of data. In other words, if the standard deviation 
for each glass is obtained, assuming that the results are representa¬ 
tive of a random sample, then the values fall within the limits so 
obtained. Might not this method of estimation give results whick 
to some extent masked the general tendencies ? 

If the data given in the Appendix were examined first of all 
for seed, each particular run being considered separately, it would 
be observed that in several instances there was an unmistakable 
increase in seed-count as the journey number increased. This was 
also true for individual runs, considering the vein-count only. In 
many instances another tendency, which to some extent modified 
the previous one, seemed to be indicated. This consisted of an 
initial fall in the seed- or vein-count before the general increase 
occurred. It was thought desirable to consider the relationship 
between seed and journey number and between veins and journey 
number, separating the results for each glass and each furnace. 
Considering seed first, Table VIII below showed that for pot No. i, 
glass A, furnace 18, in which the mean of five separate runs was 
given, there was no general tendency for seed to increase or decrease, 
but for pot 2 there did appear to be a general tendency for the seed 
to increase with the journey number. Similar values were included 
for glass A, furnace 19, for which only one value was available 
for each pot. These results showed an initial fall in the seed-count 
followed by marked increase. The values for glass A obtained in 
furnace 19 on September 5 th, 1931 , showed a similar tendency. 

In certain of the results for glass B there was an initial fall in seed- 
count over the first two or three journeys, but, as already mentioned, 
there did not appear to be any noticeable tendency for the seed 
for glass B to alter as the journey number increased. For glasses 
C, D and E the fact that in many instances these were melted in 
pots which had already been used for melting other glasses intro¬ 
duced a new variable which might be of importance. In addition, 



1936] on Mr. Gould's and Di . Hamptons Paper . 167 

Table VIII. 

Relationship between Seed-Count and Journey Numbet. 

Seed Count 


CrHfaS 

ruinate 

Journev 



1 


Pot 1 

Pot 2 

A ! 

18 1 

1 

62 

35 

1 

2 

72 

45 



3 

46 

43 



4 

59 

50 



5 

61 

90 



6 

45 

63 



7 

56 

63 



8 

66 

46 



9 

66 

88 



10 i 

73 

78 

A 

19 

1 

1 | 

76 

59 



2 i 

1 44 

45 



3 

1 44 

41 



4 

| 51 

45 



5 

1 84 

50 



6 

101 

94 

A 

19 

1 

75 

90 



2 

51 

53 



3 

35 

28 



4 

97 

68 



5 

128 

90 



6 

94 

98 



7 

85 

73 

D 

18 

1 

45 

_ 



2 

o 

33 

— 



6 

4 

_ 

103 



5 

— 

72 



6 

57 

77 



7 

54 

43 



8 

55 

56 



9 

62 

55 



10 

67 

65 

D 

20 

1 

74 

51 



2 

42 

56 



3 

4 

r> 

53 

39 



33 

80 



6 

40 

69 



7 

68 

64 



8 

73 

69 



9 

65 

79 



10 

69 | 

i 

66 


for glasses G and E the number of results was so small that there 
was little point in considering them in detail; but even for these 
there were indications that seed increased with journey number. 
For glasses D there were more values, and in Table VIII the mean 



168 


Discussion 


[No. 2, 


seed-count on 7 runs of this glass for furnace 18 was given. There 
was again an indication of an initial fall in seed followed by a rise, 
hut it should be noted that the first two figures recorded for each pot 
depended on only one determination. A somewhat similar result 
was obtained when the seed-count on glasd D, furnace 20 , was 
considered. 

With regard to veins, the results given in Table IX below for 
glasses A and D indicate that much the same results were obtained. 

The above analysis seemed to indicate that by grouping all the 
results for all furnaces together for each particular glass, there was 
a danger that certain tendencies in the curves would be smoothed 
out. In addition, it seemed that when the results for each glass 
for each furnace were dealt with separately, two general tendencies 
were exhibited in a number of instances. In a few the first of these 
tendencies was not observed. 

The two tendencies were : (a) an initial fall in the quantity of 
seed or veins as the journey number increased, and (b) thereafter 
an increase in the quantity of seed or veins as the journey number 
increased. 

If the first tendency in certain glasses was found to exist, then 
there must be some other factor operating during the first few 
journeys for these glasses, and this factor was different and opposed 
to that which operated later. 

As the result of the previous discussion, the speaker found 
himself wondering about the conclusion reached by the authors in 
Section 6 of their paper, at least as far as certain glasses were con¬ 
cerned. Section 7 assumed the truth of the result of Section 6 in 
order to make further deductions. It would seem that the proof 
of the statement in Section 7 became much more complex than would 
appear at first sight, but by using a graphical method of estimation 
it appeared that the order in which the glasses had been placed by 
the authors with regard to seed or veins was not affected by the 
considerations raised. 

Another point occurred to him with regard to Section 5; for 
glasses C and D (1931-33) a very low value of correlation coefficient 
between veins and seed in the same type was obtained. In 1934 
negative correlations were obtained. Would the fact that other 
glasses had been melted in the same pots before the melting of these 
particular glasses be expected to have any effect on the correlation 
coefficient ? This question does not arise with regard to glass B, 
and the results for glass E were so few that it seemed doubtful 
whether the correlation coefficient was very reliable. 

It would be observed that the speaker had neglected throughout 
to consider standard errors or deviations in order to exhibit certain 
results. He would like to ask the statisticians present whether the 
results for certain of these glasses, notably A and D, were sufficient 
in number to enable them to test the validity of the considerations 
here suggested. 

He noted that Professor Pearson's remarks on the analysis of the 
results for glass B were confined to 5 journeys and 5 particular 
runs, the reason being that for these runs complete figures were 



1936] on Mr. Gould's and Dr. Hampton $ Paper. 169 

Table IX. 


Relationship between Veins and Journey Number. 


Glass. 

Fuiuaef. 

Joiiriioy. 

1 

Vans. 

Pot 1. | Pot 2. 

A 

18 1 

1 

12 

13 


2 

27 

14 


3 

20 

11 


4 . 

23 | 

14 


5 1 

13 

28 


6 t 

12 

6 

} 

7 

IS 

8 

: i 

8 1 

31 

29 

1 

0 1 

14 

36 


10 ' 

16 

25 

A 1 

1 19 

1 | 

0 

3 


2 1 

17 

9 


3 

16 

4 


4 

19 

1 1** 


5 

36 

i 28 



6 

12 

48 

A 

19 

1 

27 

37 



2 

9 

5 



3 

9 

5 



4 

33 

21 



5 

57 

58 



6 

53 

3S 


7 

39 | 

37 

T> 

1 IS 

1 

: 

4 ! 

_ 


o 



1 





3 

— 

— 



4 

— 

14 



5 

— 

9 



6 

20 

3 ^ 



7 

28 

23 



8 

26 

29 



9 

34 

14 



10 

34 

22 

D 

20 

1 

13 

5 



2 

8 

6 



3 

i 

12 

12 


** 

5 

__ 

2 


* 

4 

1 

| 

7 

37 

22 


1 8 

7 

17 


. 9 

1 22 

32 


I 10 

1 4 

28 


available. On the other hand, there were several incomplete runs 
on glass B not mentioned, and in the case of other glasses the data, 
particularly for certain of the earlier journeys, were entirely lacking. 



Discussion 


170 


[No. 2 , 


Could the same methods of analysis of variance be applied to data 
of this incomplete type ? 

Dr. Goodmg said he would like to leave these numerous questions 
with the statisticians; he made no claim to be a statistician him¬ 
self, but in the results he had obtained in other connections he had 
found it necessary to consider just the kind of problem submitted 
that evening, and the questions he had asked had repeatedly occurred 
to him in connection with such work. 

Mr. Yates said that he proposed to say a few words on the 
methods which might be employed in the design of experiments in 
work of this kind. The authors of the paper were concerned to 
ascertain the causes of variation between pots, and also the differences 
between different types of glass; they had given detailed particulars 
of quality measurements on 5 types of glass, and had formed esti¬ 
mates (Table II) of these quality differences. In particular, they 
had noted that there was a marked superiority of glass C over glass B 
as regards seed, the estimated differences being: 1931-33, 17 * 4 ; 
1934, 14 * 5 . 

Endless complications might arise in the analysis of heterogeneous 
and unbalanced data of this type. In order to test the validity of the 
estimates of Table II the speaker had taken out the mean differences 
in seed of B and C in those runs in which both B and C occurred 
together. These differences were found to be as follows : 


Run. 

Difference. 

2.12.33 . 

31-5 

12.6.34. 

9-8 

23.8.34 . 

6*2 

* 14.10.34 . 

12-4 

I - _ 

Mean. 

15-0 


This agreed well with the values of Table II. In each run, however, 
glass B occurred before glass C. and in all these 4 runs the first journey 
was decidedly higher than the later journeys (as the last speaker 
had emphasized). If this first journey were omitted, the differences 
were then as follows : 


Run. 

Difference. 

2.12.33 . 

.! 13-0 

12.6.34. 

.1 1-5 

23.8.34 . 

.. 0-5 

14.10.34 . 

.| 8*1 

Mean. 

5*8 

1 

This omission, therefore, caused 
mean difference. 

a very striking reduction in the 

This feature of the observations served to illustrate the first 



1936] 


on Mr. Gould's and Dr. Hampton s Paper. 


171 


fundamental principle necessary for satisfactory experiment design— 
namely, that of randomization. Had the choice of whether B or 0 
should come first been made wholly at random, then both B and G 
would have had an equal chance in each run of the possibly un¬ 
favourable first journey. 

It was, of course, possible that manufacturing considerations 
compelled the use of glass B before glass C, but in that case any 
conclusion as to the superiority of glass C was likely to be of doubtful 
validity. 

The efficiency of the design could also be improved very con¬ 
siderably. As the tests were at present arranged, differences between 
journeys had not been eliminated, because the same type of glass 
had, with rare exceptions, been melted in both pots on the same 
day. A typical arrangement, for instance, of two glasses had been 
as follows: 





Pot 1. 

Put 2. 

Journey 1 ... 



B 

B 




B 

B 

, 3 ... 




B 

,, 4 



.J C 

C 

„ 5 ... 



c 

C 

„ 0 ... 

... 


..J c 

C 


The analysis of variance given by Professor Pearson showed that 
th$ mean squares between runs, between pots within runs, and 
betvteen journeys, was very considerably greater than the inter¬ 
action between pots, journeys, and runs. The above arrangement 
was one in which the effective experimental error (supposing there 
was proper randomization) was substantially that given by items 
3 and 4 of Professor Pearson's Table, namely 2 , 514 . In order to 
eliminate the difference between journeys as well as between runs 
and pots, an arrangement of the following type might be adopted : 


11. 


. 

Pot 1. 

Pot 2. 

Potl. 

Pot 2. 

Journey I .. 

B 

c 

C 

B 

9 

B 

c 

C 

B 

„ 3 . 

B 

c 

c 

B 

„ 4 . 

V 

B 

B 

C 

„ 5 

G 

B 

B 

C 

6 . 

C 

B 

B 

c 


the selection between I or II being made in each run by random 
choice. Each run would then provide a single estimate of the 
difference between the two glasses, and the significance of the mean 
difference over several runs would be tested by the t test. By an 



















172 


Discussion 


[No. 2, 


arrangement of this type the error variance would be reduced to 
something of the order of 690 (item 5 of Professor Pearson’s table). 
A fourfold gain in efficiency might, therefore, be expected. 

If several glasses were to be compared, then a set of comparisons 
of all possible pairs could be made in this manner. With four 
glasses for instance, there would be six pairs : 


A-B 

A-C B-C 

A-D B-D C-D 

If each pair was replicated twice (12 runs in all) there would be 3 
treatment degrees of freedom and 8 error degrees of freedom, since 
each run in effect gave a single comparison.* 

Investigations into the causes of variation in the pots could be 
carried on concurrently with the investigation of differences between 
glasses, for pot 1 and pot 2 were equalized as regards glasses in any 
one run, and could, therefore, be used to compare different types of 
pot or conditions of pot manufacture. 

In conclusion, Mr. Yates added his thanks to those of the previous 
speakers to Mr. Gould and Dr. Hampton for what he considered 
to be a most interesting paper, and in addition a most illuminating 
account of glass working. He was most pleased to see that at the^e 
meetings at any rate it was not considered to be beneath the dignity 
of the statistician to acquaint himself with the brute facts governing 
the material from which his figures were derived. 

Mr. Jeyyett hoped that the authors would not take it as a 
reflection on them if he said that what he appreciated most was 
the way the data had been supplemented by the analyses put forward 
by the statisticians. It was refreshing to go from a mass of data 
to a table like Professor Pearson's, where the data were set out 
concisely and systematically and in a form most appropriate for the 
use of works managers, who would not wish to wade through a mass 
of data. 

Mr. Jennett was afraid he could not add much to the discussion; 
anything he might say would be in the form of questions to the 
statisticians. One thing he would like to mention concerning the 
presentation of correlation coefficients. In presenting the data re¬ 
specting lamps, in order to forecast lamp quality, they were reminded 
that the regressions were important, rather than the coefficients of 
correlation. He saw no mention of regression in connection with the 
analysis of these data. He appreciated that there was a difference 
with regard to correlating pots 1 and 2 , but with regard to the seed- 
count from different journeys, was there any reason why regressions 
should not be worked out 1 

With reference to the missing data. Professor Pearson had taken 
the cream of the results for his analysis. Was it convenient to the 
statistician to take only balanced experiments? In practice they 

* The analysis of this type of arrangement is described by F. Yates in 
“Incomplete Randomized Blocks,” AtmcUs of Eugenics , vol. VII, pp. 121-140. 



1936] 


on Mr. Gould's and Dr. Hamptons Paper. 


173 


were concerned witli two cases : the analysis of data already exist¬ 
ing, and the planning of experiments with a view to the analysis 
of the results derived therefrom. In both cases industrial workers 
were frequently up against material that was not balanced; although 
they might plan their experiments, they often found when they 
came to analyse them that the resulting data were not balanced, 
owing to the total loss of some items. Whether methods of analysis 
already existed or were being developed, it was certain that there 
was very great need in industry for ways of dealing with such sets 
of results. 

Mb. Gosset said he would like to confirm a remark of Professor 
Pearsons about the difficulty of working with large-scale results. 
In most cases the whole object—or one of the principal objects— 
of manufacture is to keep the product as uniform as possible. In 
addition to that and in order to obtain that, it is necessary to keep 
the raw materials as constant as possible; consequently, when one 
looks at large-scale results, there is no variation to work upon, 
and the statisticians are helpless, at any rate until something has 
gone wrong. 

Mr. Gosset said that up to the present he had been interested 
in spectacle glass only as a consumer, and his excuse for intervening 
in this discussion was that he could illustrate the use of a simple 
statistical method on the tables which were given at the end of the 
paper. 

In an investigation such as this, where one wished to throw light 
on the behaviour of a large-scale process, the method of correlation 
was very often useful, but at first sight the tables did not look very 
promising, split up as they were, into very small samples, both by 
the small numbers of journeys per pot and the different kinds of 
glass. In this connection he would say to Mr. Jennett that there 
were two uses in correlation, one was the use of the regression line, 
and that was doubtless the best, and the other its use merely as a 
measure of the relation between the two things. 

There was a method of correlation used largely by psychologists, 
known as 4(1 Spearman’s method ; it was not an efficient method— 
that is, it did not utilize all the information supplied by the samples, 
so that about 20 per cent, larger samples must be collected to give 
as accurate a result as the ordinary correlation coefficient, yet, 
owing to an artful method of calculation, it was so simple that when 
playing with other people’s figures, for instance on a railway journey, 
it was the obvious one to use. It consisted of replacing each variate 
by the figure representing its numerical order, and correlating these 
numbers. 

By this method, Mr. Gosset said he had obtained weighted 
average correlation coefficients between the number of veins and the 
order of the journey, which put it beyond all question that the later 
the journey the worse the veins. This weighted average was derived 
from all the 98 samples discoverable in the tables: the mean size 
of sample was just over 4 , the greatest was 8 and the smallest 2 . 
The average results were as follows : 



174 


Discussion 


[No. 2, 


For A 

... 0-31 

2*6 times its standard deviation. 

„ B 

... 0*27 

3*1 


„ c 

... 0-44 

—* 99 

»» j * 

„ D 

... 0*11 

0 *8 

»» 

„ E 

... 0*19 

1*3 

»* 

Total ... 

... 0*26 

4-r> 

Jt •*» 


A, R, and C were all significant. D and E were not so, but there 
was no evidence that any glass behaved differently from the others. 
When be said k< standard deviation 55 it was calculated on the 
supposition that there was no correlation at all. It meant the 
standard deviation of correlation coefficients of samples of the 
appropriate degrees of freedom drawn from uncorrelated material, 
and the mean 0*26 corresponded to a correlation coefficient of about 
0-30 if large samples had been obtainable. 

This did not confirm the authors’ conclusion, and Mr. G-osset 
could offer no opinion as to the disagreement unless it was the 
custom to stop using, at an early stage, pots which had given poor 
results. A similar investigation into seeds showed that there was 
no evidence of correlation between seeds and order of journey except 
in the case of glass A, where the correlation was 0 - 27 , 2*3 times the 
standard deviation. 

He had also tested the correlation between refractive index and 
both seeds and veins, the former without any success, but there was 
a distinct indication that the higher the refractive index, the worse 
the veins; perhaps the veins themselves had a low refractive index. 
The evidence was not significant, since the correlation coefficient 
0*16 was but i -6 times its standard deviation, but if the matter 
was of any importance, this might give a line for further investi¬ 
gation. 

Mr. Gosset again expressed his great interest in the subject- 
matter of the paper. 

The Chairman said that it was his duty in closing the meeting 
to propose a vote of thanks to the originators of the discussion, 
and in doing so he would like to include those who had contributed 
to it. If the industrial man would study these contributions he was 
sure that progress would be made. In conclusion, he would like to 
give a glass-house story. When he was first introduced to glass¬ 
making in 1913, they wanted one day to get rid of seed, and an 
old glass-worker said, 44 We will put some arsenic in.’* A fortnight 
later there was a considerable amount of seed, and the same work¬ 
man said, u Well, take the arsenic out.” 

On being put to the meeting the vote of thanks was carried 
unanimously. 

Mr. E. D. van Rest sent the following contribution after the 
meeting: 

There is an extension to the problem which has not been men¬ 
tioned by the authors or subsequent speakers, perhaps from an 
unworthy desire to keep the argument simple. 









1936] 


on Mr. Gould's and Dr. Hampton s Paper. 


175 


The glass-maker is interested, not directly in the number of 
seeds and veins, but in the number of clear squares or circles he can 
obtain from a cylinder. The authors mention early in their paper 
that they found a linear relation between this number and the 
number of seeds plus length of veins, and therefore used the latter 
as a measure of the former. Different dispersions of the defects 
would allow different yields of clear glass, so that it would be of 
interest to the glass-maker to know the variation in the relation, 
and perhaps make use of it to obtain greater yields. The statistician 
can help here, for an analysis of the variation could be drawn up on 
lines parallel to that presented by Professor Pearson, but dealing with 
regression lines (relating number of clear squares to seeds and veins 
on each cylinder), and differences between regression lines for pots, 
journeys, and glasses. The twelve observations on each cylinder 
would be sufficient to fit a regression line involving only two terms 
(seeds and veins). Such an analysis might be more informative 
than the simple one. The authors could judge whether the extra 
work was likely to be profitable. 

Information concerning the arrangement of the defects is hidden 
in the 12 samples taken over the length of each cylinder, of which 
only averages are presented in this paper. Should it be proved 
that defects were more frequent in certain parts of each cylinder, 
this would be one step towards eliminating them. Perhaps the 
authors have already satisfied themselves that the distribution 
of defects is sufficiently uniform not to make the extra work worth 
while. 

Me. Gould and Dr. Hampton replied in writing, as follows: 

We should like first to place on record our appreciation of the 
interest and assistance that Professor Pearson gave us in con¬ 
nection with the preparation of this paper. Since we are primarily 
concerned with the manufacture of glass, and do not claim to be 
expert statisticians, we were somewhat diffident about putting this 
paper forward, and it was only because of Professor Pearson's 
encouragement that we attempted to go on with it. The discussion 
which followed the paper seems to justify his opinion that the data 
we had accumulated would be of considerable interest to statisticians 
in general. 

W T e do not propose to comment on the method of the Analysis 
of Variance used by Professor Pearson, but some of his conclusions 
have raised other questions which are closely associated with the 
manufacture. We did not mention in our paper that there were 
three sources whereby extra seed could be contributed to the finished 
glass, and in view of Professor Pearson's suggestion that some of 
the differences in his residuals may be due to the variation in 
manipulative skill of the operative, it seems desirable to make mention 
of these variables now. In the first place, it is a matter of opinion 
when founding is stopped and the glass is allowed to cool. There 
is a source of variation here, since on some occasions the glass will 
obviously be cooled when it contains more bubbles than it does 
on some other journey. It is probable that the melt is never entirely 



176 


Discussion 


[Xo. 2, 


free from bubbles. In tbe gathering operation it is known that 
by exercising slightly less care the gatherer may introduce bubbles 
between one gathering and the next, but there is no way in which 
the gatherer can reduce the number of bubbles. It is clear, there¬ 
fore, that on occasion more bubbles will be introduced into particular 
cylinders by this means. It is also known that during the reversal 
of the gases in the furnace, for instance, a flush of gas may come 
through, and this will cause further bubbles. Again, there is no 
means whereby the number of bubbles can be reduced. All three 
of these variables tend to increase the number of the bubbles in 
particular cylinders, and it may be that this is the cause of the 
spread of the frequency curve towards the end showing a large 
number of bubbles, and it is suggested that this is the explanation 
of the occasional cylinders which are very seedy. It is interesting 
to note that Professor Pearson’s conclusions after his analysis of 
the selection of results that he took agree generally with those put 
forward from our own investigation—namely, that any ageing 
effect is small, that the differences within runs are correlated in 
the two pots, and that the pot itself does appear to have an effect 
on the quality. 

Mr. Welch’s conclusion that there is no evidence that seed and 
veins are introduced simultaneously, nor that they are related to 
one another, again confirms the conclusions we reached. This is 
a matter of considerable practical importance, since it means that 
it is possible to reduce the amount of either defect without necessarily 
increasing the other. 

The points discussed by Dr. Gooding need the attention of 
statisticians rather than of glass manufacturers, since only the 
mathematician can decide whether the tendencies that he has 
noticed in the individual figures are justifiable conclusions from the 
data provided. 

In reply to Mr. Yates, we can only say that while we appreciate 
the desirability, from the statistician’s point of view, of carrying 
out controlled experiments in order to prove certain points, it is 
impossible for practical reasons, particularly as different founding 
schedules are required for each type of glass, and therefore it is 
undesirable to make one pot of each of two types in the furnace 
simultaneously unless this is absolutely unavoidable. It would 
certainly be difficult to justify such a procedure merely for the sake 
of providing a satisfactory answer to certain statistical questions. 

Mr. Gosset’s application of Spearman's method of correlation 
by ranks leads to a conclusion which differs from that which we 
drew, and incidentally from that drawn by Professor Pearson, 
from a part of the data. Whether this means that the method 
itself is insufficiently accurate to settle the question, or that 
the difference is due to Professor Pearson's selection of results, we 
cannot say. Mr. Gosset has, however, deduced by his method 
something which is actually the case, although it was not mentioned 
in the paper. It is generally the custom to stop manufacture 
when a pot appears to be giving bad results. We have not been 
able to satisfy ourselves on theoretical grounds that such a pro- 



1936] on Mr. Gould's and Dr. Hampton s Paper. 177 

4 cedure is justified, but in the absence of definite evidence of its 

undesirability, the practice is likely to continue. 

In reply to Mr. van Rest, we can state that we have already 
satisfied ourselves that the distribution of defects is sufficiently 
uniform over the cylinder to make it unnecessary to carry out the 
extra work he suggests. 



178 


[No. 2, 


The Distribution or Student's ” Ratio for, Non-Normal 

Samples. 


By R. C. Geary, M.Sc. 


1. Essential Role of Ratio in Normal Theory. 

The simplicity of form of u Student’s ” well-known distribution of 
tbe ratio t of the arithmetic mean to standard deviation, 


with n' 


f _ Vn' . x 

—, * * * 

x = 2 x t and (»' — l)s 2 = S (x t — £) 2 , 

i=i 1=1 


(i) 


in normal samples, is due principally to the statistical independence 
of the mean and the variance, s 2 , in such samples. The property, 
which was first stated by Student* in 1908 and stringently 
proved by R. A. Fisher f in 1925, is peculiar to normal samples. 
In fact, if x and y be the measures of two variates, the necessary and 
sufficient condition that these should be independent is that 


Klj = 0 , 

when neither of the subscripts i and j is zero, k vj representing the 
two-dimensional semi-invariants of x and y defined by the identity 
in a, p :— 

a* ft] 

J s= Universal mean [e 0 - 7 £v]. 

Now R. A. Fisher J has shown that the semi-invariants, K n of the 
mean x and the variance s 2 bear the following simple relations to the 
semi-invariants /c t , 2 of the parent universe 


When the variates are independent, all the K a are zero, and so, there¬ 
fore, are the parameters /c 3 , /c 4 , /c 6 , . . . , which are the necessary 
and sufficient conditions of normality in the parent universe. It is 
remarkable that it has only been necessary to utilize the conditions 
Kll = o (without taking account of the series k 12 — 0, k iA — 0, etc.) in 
order to establish normality. 

This property of statistical independence is but another instance 
of the essential role of the arithmetic mean and the standard deviation 


* Biometrilca, Vol. VI, pp. l-2o. f Metron , Vol. V. No. 3, p. 93. 

X Proceedings of the London Mathematical Society , Series 2, Vol. 30, p. 206, 





1936] 


v ‘ Student \s " Ratio for Non-Normal Samples . 


179 


in normal theory. Accordingly, it is not surprising to find that the 
determination of the frequency distribution of their ratios for non- 
normal samples of more than three seems to present extreme difficulty, 
even when the parent universe is presumed to have the simplest 
algebraical form. The solution for samples of two is usually quite 
easy (especially so for symmetrical universes) because in this case 

f _ ^ 1 + X 2 
kx—-ft 

samples being assumed measured from universal mean zero. P. E. 
Eider* has given the following solution for a rectangular parent 
universe:— 

2(i + i<ir 

The solutions for the exponential universe 2 e -lr| Sx, namely 

1 ) 


1 Sx 

and for the tangential universe - . y——— namely 

tc jl —j— a 


2 

TC 2 (Z ! 


M-to, with, 


t+1 
t- V 


have some slight theoretical interest. V. Perlo f has given the 
solution of the rectangular problem for samples of three. 


2. Asymptotic Formulas for Moments of t for any Universe. 


Assume that all the san^ple values are measured from the universe 
mean value zero. Using E. A. Fisher's notation,J the ratio t may 
be written in the following form 


t = Vn'tjlf = 


VTli 

kM i + * 2 ~** V’ 

\ k 2 


where /c 2 is the universal variance and, in order to find the first four 
universal moments of t (from zero origin) the denominator will be 

expanded formally in powers of —-- 2 , which is of order n'-*. 

k 2 

There will appear two-dimensional moments of Zq and Z* 2 for which 
are substituted their values in terms of the two-dimensional semi¬ 
invariants and then, for the latter, their expressions in terms of the 


* Biometrika, Yol. XXI, 1929, pp. 140-141. 
t Biometrika, Yol. XXV, 1933, p. 203. 


t Op. at., p. 203. 



180 


Geary —The Distribution of 


[No. 2, 


semi-invariants of the parent universe according to the simple 
formulse which Fisher * * * § has provided. We find the following 
formulae correct to n '- 2 , i.e., terms in n'~: are neglected:— 


> 

V = - + if-,(2*3 - 2*5 + 5>^ 4 ) + ...], 

777(1 + V) + 772(3 — *4 — 3*3*5 + 6*3 2 * 4 ) + 

ft ft 

V = - ,71 (|>-3 + ^-,( 210*3 - 66*5 + 105 * 3*4 + 

210V)+ • 


1 

■J’ 


}(ii) 


V = 3 + 1(9 - * 4 + 14V) + 772(102 - 30* 4 + 24* 6 + 
120 V + 1*6 - 132*3*5 - 6* 4 2 + 168V* 4 + 120 V) 


with 7 t = k^k^ 12 * where the K t are the semi-invariants of the parent 
universe.^ As a partial verification of the algebra it may be noted 
that for \ = 0 the values of jjl x ' and g 3 ' are zero and 


and 


2 0 

«*•' = 1 + ,7 + 7/2 + 



^ ,= 3 ( 1 + 7 +^+- • •) = 


3(«' - l) 2 
(«' - 3)(«' - 5)’ 


the expressions on the right representing the correct normal value* 
derived from “ Student's ” distribution. f 

Perhaps the most significant feature of the foregoing formulae i 
that for g 2 ' the term in n 9 * 1 reduces to its normal value 2/// when th 
population sampled is symmeit ical. For medium-sized sample 
drawn from such universes the normal theory value gives a clos* 
approximation. It will be noted generally that the numeric* 
coefficients of terms containing >. 3 and (which will vanish for syi 
metrical universes) are much greater than for terms in even ord 
X's alone. It follows that we should expect to find “ Student's 
distribution more accurate in its application to symmetrical no 
normal universes than to skew universes. This result is in accor( 
ance with the experimental results of E. S. Pearson J and wit 
“ Student's§ own surmise. 


* Op. cif ., p. 206. The method used here for the expansion of t has bee 
suggested by that which Fisher used for the expansion of £ 3 /£_>!’ and £ 4 /£ 2 2 . 

t M. S. Bartlett (reference belou) has given expressions for and y, 
correct to n ,m ’ 1 . 

| BiamttriLa , Vol. XXI, 1929, p. 274. 

§ Biometrila, Vol. VI, p. 19. 



Student '.s " Ratio fa) Non-Normal Samples. 


181 


1936] 


0. The Distribution of" Student's " Ratio in Samples of any Size 
drawn from a Slightly Asymmetrical Universe. 

The particular asymmetrical universe from which random samples 
of id are drawn is assumed to be that known as theSecond Approxi¬ 
mation to the Law of Error,'" which takes the form 



when the universal mean and standard deviation are assumed to be 
zero and unity respectively as they may, for the present purpose, 
without loss of generality. The importance of this distribution 
derives, of course, from its representing the frequency distribution 
of the arithmetic mean of samples of n' when terms in n'- 1 are pre¬ 
sumed negligible. In what follows the assumption is made that the 
^ skewness " > 3 , which is supposed known, is so small that terms in 
rue negligible but there is no limitation as to the number in the 
sample.* The probability of a sample x lri x 2 ,..., x n . is given by 

n p(jt)^ t .(iv) 

i - 1 

a 

which (lifters from the corresponding distribution in the normal case 
(since > 3 2 is negligible) by 

- 1 ,>L,>-JSrJrWntr„. . .(») 

1 <*=>■ 

and the corrective term in t to wL Student's ” distribution for the 
normal universe will be found by integrating (v) between suitable 
limits, t being given by (i) above. This integration will be performed 
^by the now familiar method of transforming orthogonally the original 
k e variates into new variates y lt one of which, say y n , = Vi? x, and 
hen changing the new variates y v y 2 , . . . into generalized polar 
^ o-ordinates. Any orthogonal transformation will do, but it will be 
onvenient to apply the reciprocal of the Helmert transformation 


wl 

tin 

ex 

Tk 

an 

im 


x t —* X<> 

V2 


Vi' 


+3 “ 2jc * 
VS " 


*Jn' = 


^1+^2+ j j • + x n ' 
Nil' 


* Cf. M, S. Bartlett { Proc. Camb. Phil, #oe., VoL 31, Part 2, p. 226), who 
as furnished an expression for the distribution of t for fairly large samples but 
king into account not only A 3 ( = \ jSj) but also of the parent distribution. 




Geary —The Distribution of 


[No. 2, 


to the factors of (v). Then 

n' — 1 2 — 1 nt 3 ft' — 2 n' — l — l 

Sj *‘ S = _ ,5 1 v^'VT) y ' S + v7 + 3 t =i ,=i. V{i+jW+j+ l) 


+ - ^Vy, 2 .(vi) 

VH i=l 

Sjt, = vV y* 

?i *—i 

nSvr, = sy n , n By, 

i=l 

Sr, 2 = y„ 2 +* Sy, 1 = y n .* + P 2 , say, 

i=l 

t = V or !/,, = —-it=- 

p — i 

The variates y 1? y 2i . . . are transformed into polar co-ordinates 

Vi = P sin <^_ 3 sin <^- 4 ... sin ^ sin 

, . . , 0 9 o ^ 27t 

y 2 = P sm sm ^,,- 4 . - - an& cos 0< ^ <7c(i>0) 

tffc'-l = P COS <£„.- 3 J 

so that 

sy, 3 = p a 

t=l 


n %, = p"'- 2 Sp . sill Vn-3 si]1 V«'“4 • • • sin ■ ■ ■ Hq 

*=1 

On integration to obtain the corrective function of f, the terms re¬ 
sulting from the first and third terms on the right-hand side of (vi) 
will vanish because they must always contain cosines of angles to the 
powers of 1 or 3 integrable from 0 to tz. Finally the corrective function 
of t is expressible in integral form 

<■ *1 + r/p 3 - ^ pK '~ ldp 

• o 

with n — n r — l 

which reduces to the definitive form for the corrective function 


>• ■ «'>*' - k ■ vipW 3 ” ~ 

V n) 


»+T 


It will be observed that f(t) is an odd function of t, as that on 



1936] 


“ Student's ” Ratio for Non-Normal Samples. 


183 


integration between limits + a and — a it vanishes. We are con¬ 
cerned principally with the frequencies 


>3 f <f>{t)dt and x 3 f <j>[t)dt 


which are equal in magnitude but opposite in sign. The former is 
given by the following expression 

f t)dt = 6 • V / 2>TI^( 1 + D _ ~( X + 

The values of this integral for certain values of n and t are given 
in the following table. 

Values of f <j>(t)dt = — jf <j>(t)dt 



n. n. 

(NX- 

1. 

2, 

4. 

3. 

0. 

10. 

19. 

20. 

29. 

30. 

0 




0*0149 

0*0121 

0*5 

0*0589 


0*0276 

00196 

0-0160 

1-0 

0*0665 

0*0495 

0*0367 

0*0265 

0*0218 

1-5 

0*0622 


0*0354 

0*0258 

. 0*0213 

2*0 

0*0547 


0*0263 

0*0184 

0-0150 

2*5 

0*0476 

0*0266 

0*0164 

0*0104 

0*0080 

30 

0*0416 

0-0184 


0*0049 

0*0036 

3*5 

0-0368 

0*0127 


0*0021 

0*0013 

4-0 

00329 

0*0088 

0*0027 

0*0008 

0*0005 


As a check on (vii), it will be found that the first and 
third moments of t calculated from the formula are as follows, to the 
approximations indicated:— 

h' = x J W)dt = - ^h(i + ^ + ■ ■ ■) 1 

1 r | (ix) 

^ = hj + • • •) J 

J— 00 

thus agreeing with the formulae for and given above at (ii). 

The full expression for the probability of obtaining a value of the 
ratio less than an assigned value t for samples of any size drawn from 
the universe (iii), when X 3 2 is negligible, is 

fm + hmw, 

*’—00 

supp. vol. in. no. 2. 


H 


















184 


“ Student's ” Ratio for Non-Normal Samples. [No. 2, 


where f(t) is “ Student’s distribution for normal samples. It will 
be observed that the expression 

m+htim 

represents the probability of the ratio for samples drawn from any 
universe, provided that the samples are so large that terms in n'~ l 
can be neglected (but terms in n '~* retained), as well as the probability 
of the ratio for samples of any size drawn from the particular universe 
(iii), when terms in X 3 2 are neglected. The expressions given above for 
the first terms of the asymptotic expansion of the moments of t have 
indicated that X^ 2 (or higher powers of X 3 ) can only occur in terms in 
n'- 1 (or in higher negative powers of n') so that it is not unlikely that 
for samples of moderate size the probability 

{/( 0 + 

has quite an extended range of applicability, provided always that 
at least the lower frequency constants and X 6 are small. This 
is a matter for experimental investigation.* 

As an example, consider the case of X 3 = 0-25 and samples of 
10 ( = n'). It is required to find the probability of t greater than 2 . 
From Student’s ” table f the probability for normal samples is 
0 * 0383 . Accordingly the probability is 

0*0383 - 0*25 X *0263 = *0317, 
and the probability of t less than — 2 is 0 * 0449 . 


* An expression for the distribution of t, correct to »' _1 , is as foliowb:— 



\dtl ^ 24 \dt) ^ 72 \dt) } 


e 


1 At-vi)* 

2 p s 


where r lf i? 3 and r 4 are the semi-in valiants of t, ’which are derivable from 
(ii) above, i.e., 

= Pi' = — A s /2n'i 
«= = ^'-f‘i' a =l+ 5 ^, (8 + 7A ! a ) 
o 3 = -2A 3 /n'i 


»4 = 




6 - 2Aj + 


27A S 2 


f Melton, Vol. V, No. 3, 1925, pp. 114r-118. 



1936] 


185 


Some Notes on Insecticide Tests in the Laboratory 

AND IN THE FlELD. 

By M. S. Bartlett. 

I .—Laboratory Ex])erimeitts. 

1 . Dosage-Mortality Data. 

Two admirable papers by Bliss 2 leave little to be said on the 
general statistical principles involved in analysing toxicity data 
obtained from quantitative experiments in the laboratory. It will 
thus be assumed here that these principles are known. An example 
of the type of data that accrue in laboratory work is cited below as a 
useful illustration of some of the points to be considered, and as a 
contrast with a later example, used to illustrate experimental data 
collected in the field. It should be stressed that these particular 
examples are considered here mainly in relation to their value as 
illustrations of statistical method. 

Some laboratory experiments by Dr. H. H. S. Bovingdon on the 
control of the bed-bug (Ctmex lectulanus L.) by measuring its 
resistance to a certain fumigant gave results which were adequately 
represented by a linear relation between mortality (measured in 
probits) and log. dose. (About 30 insects were used at each dosage.) 
The estimated regression equation for 2 hours’ exposure, for example, 
was:— 

y = 18*2185 x - 32*787, . 

X 2 = 24*010 (14 df.) 

where y stands for probits and x for log. dose (dose in mgs./litre). 

The estimated dosages for 50 per cent, kill (5 probits), 90 per cent, 
kill ( 6*282 probits) and 99 per cent, kill ( 7*326 probits) may also be 
given* 

Table I. 


Mortality. 

Dosage. 

oo% 

118-6- 1-3 

90% 

139-5^2*5 

89% 

159*2^4-7 


2. Standard Errors. 

It will be noticed that approximate standard errors have been 
added to the estimated dosages. These are not intended to replace 
exact fiducial limits, or tests of significance of the departure of the 



186 


Bartlett —Some Notes on Insecticide 


[No. % 


regression line from other regression lines, but are nevertheless often 
useful in indicating sufficiently well in a summary of the data the 
order of accuracy reached. 

They are obtained from the formula 


or 




y2 f X 

= »b 2 iS(u’) + &SJw[x 


(y - yf 


m 


}■ 



■ ■ (i) 


approximately, where n is the number of degrees of freedom, and 
w the weight of each mortality observation (for notation, cf. 1 ). 
This approximate formula may be compared with the exact fiducial 
error given in equation 28 of Bliss’s second paper , 2 with which it 
would be identical if we neglected 


/V(6) 
fe 2 * 


From the standard error for the log. dose, we can give a corre¬ 
sponding standard error for the dose to the same order of approxima¬ 
tion, if this standard error be sufficiently small (larger errors in the 
dose will be asymmetrical). 


3. Time of Exposure. 

The example cited above was one of a set of four sets of experi¬ 
ments carried out for four times of exposure. The estimated doses 
for 50 per cent, kill were :— 


Table II. 


Time (hrs.). 

I>osa£?» for 50 ]>tr cint. kill. 

2 

118-6 1-30, (124*9) 

5 ; 

71*64-1*22, (62*80) 

16* 

21*43 0*97, (25*64) 

24~ 

18*59 1*35, (17*29) 


If these four determinations are plotted on a log. t —- log. dose 
scale, they appear to fall approximately on a straight line. A line 
drawn by eye from this graph gives the empirical equation 

CF* = 210 , 

where C denotes the dose. The expected dosages from this equation 
are given in brackets in Table II. The deviations from these 
bracketed values do not show a systematic departure, which would 
suggest that the type of law considered was wrong, but they are 



1936] 


Test* in the Laboratory and in the Field . 


187 


obviously significant, so much so that it is unnecessary to test the 
point exactly by calculating a regression line on the log. scale with 
an appropriate weighting of the four points. 

At first sight these apparently unsystematic but nevertheless 
significant discrepancies might appear puzzling, but further know- 
le lge of the insect population dealt with suggests that some at least 
of the discrepancy may perhaps be explained. 

4. Heterogeneity of Populations. 

Bliss 1 has pointed out that where the population of insects is 
not rigidly biologically controlled, we cannot expect our experi¬ 
mental error necessarily to be confined to the inevitable sampling 
error due to the use of a finite number of insects. Although, how¬ 
ever, the values of x 2 obtained may be in excess of the theoretical 
sampling values, we may still, with certain precautions, use them to 
indicate the error actually obtained. In the experiments referred 
to here, the values of x 2 were as given in Table III, thus indicating 
heterogeneity. 

Table HI. 


Time (hrs.). 

x 2 - 1 

d.f. 

2 

24-01 

14 

5 

11-47 

11 

16* 

15-67 

11 

24“ 1 

32-27 

10 

| 83*42 

46 


This was not altogether unexpected; for example, difficulty in 
maintaining a stock of insects would lead to the pooling of insects 
of different stages of development. In the present case adults, 
and nymphs of just under the adult stage, though pooled in the 
experiments, were found on investigation (for the i 6 £- and 24 - 
hour periods) to give significantly different results, the adults being 
more resistant. 

The testing of the difference between regression lines obtained 
for each type separately, carried out by subtracting the values of 
X 2 for each line separately from the value of x 2 when a single line 
is fitted to the data (with the observations preserving their original 
weights), gave the following analyses of variance. The 2 d.f. for 
the difference are readily split up further into the difference in 
position and difference in slope, by consideration of the value of 
X 2 for the whole data after separate elimination of the me ans of the 
two groups. 



188 


Bartlett —Borne Notes on Insecticide 


[No. 2, 


Table IV. 


Ifi’ Hours. j 

LM Hours. 


df. 

A 

Variance 1 

d.f> 

z s . 

Vamncc. 

Adults v. X\ mphta 

2 

37*264 | 

18*632 

2 

26 226 i 

13*113 

j Difference m position 

1 

l 36*737 

— 

1 

26 216 

- \ 

l Difference m slope ...1 

1 

0*527 

— 

1 

0*010 

— / 

Error. 

17 

46-692 1 

2*747 

22 

41*194 

1*872 

Total 

19 

83-956 

— 

24 

67*420 



"While the error x 2 s’ are still significant, the comparison of the 
adults v. nymphs variance (for difference in position) with, the error 
term gives a highly significant result. 

An examination of the information available revealed, more¬ 
over, that the pooling of adults and nymphs was not random, as 
shown by the totals in Table V. 


Table V. 


Tmif (lirs.). 

' Adults 

j Nymphs. 

| Total. 

16 V 

139 

j 204 

1 343 

24 

253 

j 156 | 

1 409 


392 

| 360 

752 


While it is hardly worth while trying to see whether the apparent 
inconsistencies in the results given earlier can be adequately ac¬ 
counted for by such tendencies to bias—especially as the necessary 
information required is not complete—it is evident that inconsis¬ 
tencies will tend to arise from such causes. It should be emphasized 
that this degree of lack of control over the stock of insects available 
had always been recognized by the experimenter, and the data 
thus never claimed to be of a completely rigorous quantitative 
type; nevertheless, the above examination of the data is interest¬ 
ing in stressing the importance of precautions when biologically 
uncontrolled stocks are used if valid comparisons are to be made, 
and standard errors to have any meaning. 

5. Natural Mortality . 

When the population tested has a natural mortality of y per cent, 
for the period of the experiment, then the formula for the estimated 
per cent, kill is 

100 (x — y) 

100 - y 




1936] 


2VW# in fhi Laboratory and in the Field. 


189 


where x is the observed percentage mortality. The value of y is 
often small enough to be neglected, especially as it has little effect 
on the higher per cent, kills, which are of more interest; but it is 
of some importance in any extensive survey to make sure whether 
or not y can be neglected, for the probit value for the observed 
mortality must have a lower limit corresponding to the natural 
mortality. For example, with a natural mortality of 5 per cent., 
the observed probit value will not decrease indefinitely with the do^e, 
but tend asymptotially to the value 3 * 36 . One might thus errone¬ 
ously conclude that a linear relationship between per cent, kill 
measured in probits and log. dose no longer held at the lower dosages, 
though this would rarely be sufficient to cause a rejection of the 
probit-log. dose scale if this were in fact adequate to represent the 
true mortality law over the range required. 

II.— Field Experiment*. 

1. Layout and Statistical Analysis. 

If treatments are being tested on a crop with a view to the 
control of disease, weed infection or pest damage, and the crop 
yields are taken as a practical measure of the values of the different 
treatments, a randomized block or Latin square layout will normally 
be used, and the results analysed in the usual way by the analysis 
of variance method. If, however, the actual counting of numbers 
of diseased plants, weeds or pests is the main intention of the ex¬ 
periment, and these counts constitute our data, certain modifications 
from a direct analysis are frequently advisable, and the suitability 
of the design should be re-examined. 

Owing to the non-random character of most infection, or of 
plant stand, the need for replication, which has stimulated the 
design of the usual layouts, will of course still persist. The ran¬ 
domized block layout can consequently be recommended. The 
Latin square, on the other hand, usually cannot; for example, if 
any reasonable pest control has been achieved by treatment, the 
untreated plots included may be anomalous in comparison, and 
may have to be omitted from any statistical analysis intended to 
test treatment differences. 

It has been pointed out 3 that, though the range of applicability 
of the direct use of analysis of variance may be expected to cover 
many types of data which are of the nature of counts, data con¬ 
sisting of fairly small numbers may often be more validly analysed 
011 a square-root scale. Another example of the type of data 
referred to is included here (Table Yl), as it will be useful for further 
reference in connection with some points in experimental design. 



1UU Bartlett —Some Notes on Insecticide [No. 2, 

In an experiment by Mr. F. J. D. Tbomas on the control of “ leather- 
jackets,’ 1 counts were made some days after the application of toxic 
emulsions to the plots, the figures denoting two sample counts 
(i sq. ft. each) on i sq. yd. plots. The counts were made after the 
leather jackets were brought to the surface again by a standard 
emulsion. It should be noticed that although all the leather jackets 
do not come to the surface, the use of a standard emulsion renders 
the counts comparable; if counts were also taken at the time of 
application of treatments, these counts would be dependent on the 
differential power of the different treatments in bringing up the 
leatherjackets. The possibility of migration before the count was 
taken has been neglected here. 

Table YI. 


Leather jacket Counts. 



1 (Control). ' 

| 2 (Control). 

1 3 * 

4. 

j. 

0. 

Block I (i) 

1 33 

30 

S 

12 

6 

I 17 

(ii) 

| 59 

| 36 

11 

17 

10 

l 8 

II (1) 

36 

23 

i 13 

6 

4 

1 3 

<ii) 

1 24 

23 

! 20 

4 

7 

2 

III (i) 

19 

42 

10 

12 

4 

1 6 

(ii) 

27 

39 

1 7 

10 

12 

1 3 

IV (i) 

1 71 

39 

17 

5 

5 

| 1 

(«) 

1 49 

1 20 

1 26 

! 8 

5 

i 

1 

V (i) 

! 22 i 

42 

14 

12 

2 

1 2 

(ii) 

1 27 1 

22 1 

11 

12 1 

6 j 

5 

VI (i) 

1 84 

23 

22 

16 1 

17 j 

6 

(ii) 

1 50 

37 

30 

4 J 

H 1 

5 


A square-root analysis of the totals for the two sample counts 
on the treated plots completes the following summary of results. 


Table VII. 


• 

1 and 

1 3. ' 

4. 

i 

5. 

1 

6. 

s.e. 

Sig. Biff. 
(P=.0U3). 

Mean v'j-rl . 

_ 

5-58 

4*43 

3*84 | 

3*03, 

0*407 

1*23 

Mean no. fplot. 

*. 73*1 

1 31-8 

19*7 

14*8 ; 

9*8 1 

I _ 

_ 

Estimated “° 0 Con-.. 
trol”. 

. — 

l 

1 56 

73 

1 

so 

87 

— 

— 


The te percentage control ” figures included refer to the control 
of the pest, and are given by xoo — p, where p is the percentage 





1936] 


Tebts in the Laboratoty and in the Field. 


191 


number in terms of the average number observed on the untreated 
(* fc controlplots. 

2. Relative Importance of Sampling and Replication . 

The separate recording of the two sample counts allows us to 
form some idea of the value of this method of counting in such an 
experiment. If we ignore possible block differences, we obtain 
for the variability of the direct counts the following estimates 
(Table VIII). 

Table VIII. 



d.f. 

mm 

B9 

5 and G. 

Mean 

_ 

, 36-5 

12-9 

6-2 

<7i 2 

12 

1 148-0 

16-0 

10*1 

a x 2 + 2a, 2 

11 

411-5 

75*8 

30-0 

°2 2 

*“l “ 

131-8 j 

, 29-9 

10-0 


In this table oq 2 denotes the variance between duplicate samples, 
a x 2 -f- 2 cr 2 2 the variance observed between plots, and cr 2 2 in the 
last row the estimate of the plot variance, in the sense that the 
observed variance will be cr 2 2 + af/n, if n is the number of sample 
counts made per plot. If we take roughly a x 2 — a 2 2 and consider 
the alternative of taking only one sample count per plot, we have 
for equal accuracy 

2 /% = 3 / 2 ^ 2 , 

where is the number of replications with one sample count that 
will be equivalent to n 2 replications with two sample counts. That 
is, for example, eight replications with one count would appear 
comparable to six replications with two counts, and since the actual 
counting is often the part of the experiment which involves most 
labour, the single counts would probably be considered most 
profitable. 

In another experiment, it was found af = cr 2 2 approximately 
for the control plots, but for the treated plots, <r x 2 = 2 cr 2 2 approxi¬ 
mately. Confining our attention here to the treated plots, we should 
obtain from this result, 

S/n 1 = 2 /h 2 . 

On the basis of the two experiments, we should conclude that three 
replications with one count would appear at least as good as two 
replications with two counts. 

3. Effect of the Degree of Control on Efficiency. 

It is clear that the type of field data we are considering, though 
the conditions must necessarily be variable and the mode of action 







192 Bartlett —Some Notes on Insecticide [No. 2, 

of treatments obscure, bas a certain comparability at least with 
more exact laboratory experimentation on toxicity. Now, we know 
that though the ultimate object of any laboratory experiments 
may be to find a treatment and dose capable of killing all or 
practically all the organisms under test, for the preliminary purpose 
of comparing different treatments at single or isolated dosages, we 
achieve most accuracy at the 50 per cent, kill point. 1 If, however, 
we imagine for the moment an ideal field experiment analogous 
to the leatherjacket experiment already considered, where only 
the n um bers of leather jackets surviving after the application of 
treatments are observed, and the natural variation experienced is 



Tiq. 1 —The Change in Information with “ Degree of Control ” for Different 
Theoretical Distributions (see text). 


strictly random and of a Poisson type, then the variance of the 
number not killed will be nq (where n is the average number treated, 
and p = 1 — q the fraction killed), and not np( 1 — p). We should 
therefore expect our information on the relative efficacy of different 
treatments to be given by 

r = i ( 1-2) .(2) 

where I is the information obtained in the binomial or laboratory 
case. The variation of I (binomial) and T (Poisson) with the 
percentage p is shown in Pig. 1. 

If our variation is not strictly Poisson, but the variance still 
proportional to the mean, the information T will still be proportional 
to the formula (2), and reach its maximum at the same value of p. 







1936] 


Tests in the Laboratory rnul in the Field. 


193 


In practice, however, it has been observed that a better fit for the 
variance is often obtained if we write 


g 2 = xm( 1 + pm) 

where x > 1 and p > 0. This may be written more conveniently 
a 2 = a»*(l -r a q) . (3) 


Thus in Table VIII it will be noticed that the observed plot variance 
g x 2 4- 2a 2 2 does not appear, especially if the control plots are com 
bidered, strictly proportional to the mean, but increases with m 
more rapidly. A better fit which would include the control plots 
would in this case be 3w(l + Sq). Consequently the more general 
formula 


t < _ Ai - q) 

(1 + lq) 


(4) 


has also been plotted in Fig. 1 for certain values of X, the information 
for a natural variance given by (3) being proportional to 1 T . The 
binomial case is included in (4) for X = — 1 . From the graph it 
appears that for moderate values of X we may regard 8 o per cent, 
control as our optimum. 

It may justifiably be objected that the above formulae if applied 
to field conditions rest on very insecure or even erroneous assump¬ 
tions; but it is important to realize that, if an experiment were 
being planned where some choice in the level of control to be achieved 
by the treatments existed, the formulae are only employed in an 
attempt to obtain optimum efficiency, the validity of the experiment 
remaining unaltered. Moreover, a considerable widening of the 
argument would still indicate that the position of the optimum 
percentage control at 50 in laboratory work would tend to be dis¬ 
placed to a higher level of control if the number of organisms actually 
used in each test were unknown. 

The information obtained from a field experiment will naturally 
depend on the density of the pest in the experimental area, in the 
same way that the information on control of plant disease depends 
on the extent to which the disease is present. The area of the 
plot or sampled area and the level of control reached should if 
possible be such that the number of organisms observed on the 
treated plots has not become too small—say less than about five 
on the average for most of the treatments. 

Finally it should be pointed out that the conclusions drawn 
from considering a particular type of experiment, as for example 
the leatherjacket experiment commented on here, may not necessarily 
be applicable or even relevant to other types of field counting 
experiments. They may still, however, indicate the sort of question 




194 Insecticide Tests nt the Labomtoiy and in the Field . [No. 2, 

that may be asked in connection with attempts to ensure the validity 
and efficiency of these experiments. 

Acknowledgments. 

I am greatly indebted to Dr. H. H. S. Bovingdon and Mr. F. J. D. 
Thomas of Hawthorndale Laboratories, Jealott’s Hill Research 
Station, for allowing me the use of the experimental data referred 
to in this paper. 

Summary. 

An example of a set of laboratory experiments on toxicity is 
used to illustrate some statistical points and methods. 

The layout and analysis of field counting experiments, such as 
occur in pest-control experimentation, are considered, and the 
efficiency of an experiment on the control of leather jackets is 
examined. 

References. 

1 C. I. Bliss, A?in. App. Biol., 22 (1935), 134-167. 

2 C. I. Bliss, Ann. App. Biol , 22 (1935), 307-333 

3 M. S. Bartlett, J.R.S.S. Supplt. 3 (1936), 68-78. 



1936] 


193 


An Enumeration of the Confounded Arrangements in the 
2x2x2... Factorial Designs. 

By M. M. Barnard, M.A., B.Sc. (Melbourne). 

Statistical Department, Rothamsted Experimental Station. 

§ 1. Introduction . 

The type of factorial design known as the 2 x 2 x 2 . . . (= 2") 
has been familiar to agricultural experimentalists for a number of 
years. It is the simplest of the factorial systems, and is useful in 
exploring the possibilities of new manurial treatments and methods 
of cultivation, and the interactions which may exist between them. 
Its application is not confined to agricultural field trials, and there is 
little doubt that, as the advantages of factorial design become more 
widely recognized, it will prove to be of use in an increasing number 
of fields of scientific investigation. Various similar designs have 
already been utilized, for example, in animal husbandry experiments. 

In this system any number, n , of treatment factors, occur at each 
of two different levels 1,2, these levels being not necessarily the same 
for all the factors considered. There are thus 2 71 different possible 
treatment combinations. The material at the experimenter’s 
disposal may be grouped into sets of some power of two on the grounds 
of similarity. These sets might be blocks of land containing 2, 4, 
8 . . . plots as in an agricultural field trial, or litters of pigs in one 
concerned with animal husbandry. If each of the blocks contains 
only J,etc., of the treatment combinations, some of the comparisons 
between the treatments are necessarily confounded with block differ¬ 
ences in each replication, but, by confounding different comparisons 
in the different replications, some information on all comparisons 
may be obtained. It is the aim of the present paper to enumerate 
these possibilities of confounding, and, with this object in view, it is 
useful to consider first of all the structure of the systems. 


§ 2, The Structure of the 2 n System . 

A set of 2 71 values of the treatment combinations can be replaced 
by any set of 2” independent linear functions of these values. Such 
a set of linear functions is said to be an orthogonal one, if, for every 
pair of functions in the set, the sum of the products of the coefficients 



196 


Barnard —An Enumeration of the Confounded [No. 2, 


of the corresponding values is zero. The mean, the main effects 
and the interactions form such a group.* 

The structure can best be seen by setting out such a diagrammatic 
scheme as that given in Fig. 1. The 2 4 system has been used as an 
example, but precisely similar schemes will apply in all cases. In 
this arrangement there are sixteen different treatment combinations 
of the type a , h, e, d , etc. Each of the fifteen treatment comparisons 
denoted, as shown in the left-hand column of the table, by 
A, B, C ... is represented by one of the sets of signs given in the 
body of the table. The headings at the top of the table, indicating 
treatment combinations, are those appropriate to the assignment of 
the sets of signs. The symbols on the right-hand side and at the 
foot of the table will be referred to subsequently. 

Figure 1. 

The 2 4 system. 


Treatment Combinations. 


Treat¬ 
ment; 1 




at 







a, 





Treat¬ 

ment 

Com- 

Com- 

















pansona' 












1 





parisons 

(stan- j 






°i 











(arbi- 


















tranly 

arrange-] 












1 





as-' 

ment). 

e 


e 


i 

C 


c 3 


r 

| 



Cl 

signed) 


d* 

dr 

d t 


d s 

dr 



d t 

dr 

dt 


dt 

d. 

dt 

dr 


-i 

+ 

4- 

+ 

+ 

+ 

+ 

+ 

T 

- , 

_ 


_ 

— 

| — 

— 

— 

PQ 

B 

4- 

T 

4- 

4- 

— 



— 

4- 1 

T 

+ 1 

4- | 

— 


— 

— 


<* 

+ 

4- 

— 

— 

+ 

+ 1 


— 

+ 1 

T 

— 


4- 

+ 

— 

— 

P 

B 



4- 

— 

4- 


4- 

— 

+ 


+ 

— 

4- 

— 

4- 

— 

ti 

JJS 

~r 

4- 

+ 

+ 




— 


— 

- 1 

- 1 

+ 

+ 

4- 

4- 

PQRti 

AC 1 

4- 

1 4* 

— 

— 

4- 

T 

— 

— 

— 

— 

T | 

+ 1 

— 

— 

4- 

+ 

Q 

AJ> 

+ 


4- j 

— 

4- 


4- 

— 

— 

4- 


+ 

— 

+ 

— 

+ 

PQti 

BC | 

+ 

| + 


— 


_ 1 

4- | 

i d“ 

4- | 

T 

1 ” 


— 

— 

4- 

4- 

1 i SS 

BJD 

T* 

— 

| + ! 

— 

— 

4 


+ 

T ' 

— 

4* 

- 1 

— 

( + 

— 

4- 

1 B 

CD 

+ 

— 


T 

-p 


— | 

I + 

4- | 

— 

— 

+ 

+ 


— 

4- 

Pi S' 

ABC 


l + 

1 — 


- 

— i 

+ 

V 


— 

i + 

+ 

+ 

, + 

— 

— 

Qltti 

ABB 

T 


+ I 

— 


+ 1 


+ 

J — 

T 


+ 

+ 

— j 

4- 

_ 

POP 

ACB 

! *r 

— 

1 — ! 

+ 

T 


— 

T 

— 

4- 


— | 


J_ 1 

T 

— 

QS 

BVB 

4- 

— 

1 — j 

+ 

— 

+ | 

+ 


4- ' 

— 

j + 

+ 

— 

+ | 

+ 

— 

PB 

ABCJD 

: 4- 

— 

i - 

+ 

- 

4- 1 


— 

— 

+ 

1 


T 



4- 

| QB 

: 

Ps j 


Pi 

Pi 

1 P* 

Pt 

Pt 

Pi 

Pi 

Pi 

1 Pl 1 

Pi 1 

Pl 

Pl 

1 Pl 

Pi 



01 

0i 

01 

01 

1 01 

0a I 

0i 

01 


01 

1 0i 

0* 

01 

01 

01 

0a 



*1 


r s 

Ti 

r x 

1 r * ! 

ri 

i r a 

1 r s 1 

fl 

r i 

i 

'l 

r s 

1 r l 

*3 




*1 


Si 

St 

*1 

s 9 

1 s x 

S 3 

Al 

i s ’i 

*i 1 

S s 

1 5 X 

I s ” 

| Sl 



* The main effect of any factor a has been defined as the mean of the 
responses to the factor a in the presence of all combinations of the remaining 
factors; i.r. with three factors a, b and c, the main effect of a is represented 
symbolically by the expansion of J(a 3 — a 1 )(6 2 -r & 1 )(c 2 + c 2 ). 

The first-order interaction of a and b is similarly defined as one-half the 
mean of the differences in the responses to the factor a at the two levels of b 
in the presence of all combinations of the remaining factors; i.e. with the same 
three factors, the first-order interaction of a and b is represented by 
J(a a — <!*)(&;» — &iK c » t Ci). Similarly with higher-order interactions. 






1936 ] Arrangements in the 2 x 2 x 2 ... Factorial Designs . 197 


It will be seen that the interaction between any two main effects 
is represented by the set of signs obtained by multiplying together 
the corresponding signs of the pair of main effects considered, and 
that the same is true of any two treatment comparisons which have 
no treatment factor in common. This rule can be extended to in¬ 
clude the generalized interaction of any two lines, provided that if 
the comparisons represented by these two lines have a factor, or 
factors, in common (such as AB and BO), these factors do not 
appear in the resultant generalized interaction (AC, in the example 
chosen). 

This property implies that there is an internal symmetry in the 
table, such that if any appropriately chosen set of four (or, in the 
case of n treatment factors, n) lines be styled the main effects—the 
designation of the columns indicating treatment combinations 
being determined by the lines chosen—the remainder of the table 
will represent the interactions. The lines which can be 44 appropri¬ 
ately chosen 55 can be determined by successive steps. Thus if we 
now represent the treatment factors by p, q, r, s, and the treatment 
comparisons by the corresponding capital letters, any two lines may 
be taken to represent the main effects P and Q , respectively. The line 
assignable to the interaction PQ is then determined. Any one of 
the remaining lines may be chosen to represent P, the lines represent¬ 
ing PR, QR, and PQR being then fixed. Finally any one of the 
remaining lines may be taken to represent 8 ; the rest of the inter¬ 
actions are determined in accordance with this final selection. This 
process is indicated by the symbols on the right-hand side and at 
the foot of the table. 

The initial choice need not be confined to main effects. The 
first selection might have been the assignment of the first two lines 
of the table to PQ and RS with the resultant determination of the 
fifth line as their interaction PQRS . Thence the same procedure as 
before might have been followed. 

§ 3 . Confounding in the 2 n Bystems . 

We may consider now the possibilities of confounding treatment 
comparisons when a 2 n system is arranged in 2, 2 2 . . . or 2 k blocks 
in each replication. A table similar to that already given for the 
2 4 system may be supposed set out. If there are to be two blocks in 
each replication and, in any one replication, the treatment combin¬ 
ations represented by the first 2 n “ 1 columns of the table are placed 
in the first block, while those represented by the second 2 n ~ 1 columns 
are placed in the second, then the treatment comparison represented 
by the first line of the table, which is the difference between the 



198 Barnard —An Enumeration of the Confounded [No. 2, 

total responses to treatment in the two blocks, is confounded with 
block differences in this replication. 

If each of these blocks is divided into two halves, so that there 
are 2 2 blocks in each replication, and in any one replication the treat¬ 
ment combinations represented by the first 2 W ~ 2 columns are placed 
in the first block, etc., then the treatment comparisons represented 
by the first two lines and by the interaction between them, which 
represent the three possible comparisons which can be made between 
the blocks of this replication, are confounded. It is clear that the 
first two treatment comparisons may be chosen arbitrarily, and the 
third is then fixed, being their interaction. 

In general, if there are to be 2 1 blocks in each replication and in 
any one replication the treatment combinations represented by the 
first 2 n ~ 1 columns are placed in the first block, etc., then the com¬ 
parisons represented by the first Jc lines and by the interactions 
between them are confounded in this replication. Thus when the 
main effects or interactions corresponding to the first h lines have been 
chosen, the remaining confounded interactions or main effects are 
uniquely determined. The selection of the first h comparisons is to 
be in accordance with the restrictions given above. 

In any chosen design of the 2X2X2... type, the possibilities 
of confounding can, therefore, be written down without difficulty. 
It is not always advantageous to select arbitrarily the highest-order 
interactions for confounding, since the generalized interactions of 
these among themselves may be first-order interactions, or even 
main effects, and these will then be confounded also. The choice 
depends on the type of experiment and the specific points which the 
experimenter is desirous of investigating, but, for the agriculturist 
at least, it is, in general, desirable to leave the main effects and as 
many as possible of the first-order interactions clear of block differ¬ 
ences. 


§ 4 . Possible Sets of Confounded Interactions. 

Table I shows for some of the simpler designs those typical sets 
of treatment comparisons which, in general, it will be best to confound 
in any one replication. These have been selected from all the sets 
possible, on the grounds that they involve no confounding of main 
effects and as little as possible of first-order interactions. The treat¬ 
ment factors and comparisons are denoted by the same series of 
symbols as in the earlier part of the paper. No arrangements in¬ 
volving the division of a replication into two blocks only have been 
included, since, in such divisions, it is possible to confound any one 
desired treatment comparison. 



1936] Anamjcmetits in the 2x2x2... Factonal Designs. 199 


Table I. 


Typical sets of treatment comparisons ithich may advantageously 
be confounded in some of the simpler of the 2 x 2 X 2 . . . designs. 


Bnsipm. 

Sets of Trtatuifnt Cominriaous. 

2 4 m 4 blocks of 4 treatments 

AB yAB y 
CD }ACD l 

ABCD J BCDJ 


2 s in 4 blocks of 8 treatments 

1 AB 

1 CDE 1 
ABCDEJ 

ABC 1 
>ADE 
BCDE i 

AB 1 AB 1 
>ACDE ^ACD} 
BCDE J BCD J 

2 s in 8 blocks of 4 treatments 

AB I 

AC 

BC 

ADE 
BDE 
CDE 
ABCDE J 

AB 1 
CD 

AC E 
-ADE 
BCE 
BDE 
ABCD j 

- 


ABC ' ^ „4£C 1 ABCD y 

2 6 in 4 blocks of 16 treatments J£>1? ^DEF '-ADEF >ABEF > 

£C£>£ J ABCDEF I .BCiACJ J J 


AB 1 

BD 1 

BD ] 

ACE *1 

CD 

ABC 

ABC 

BDE 

ABCD 

ACD 

ACD 

BCF 

2® m 8 blocks of 8 treatments ACEF 

±BEF 

yCEF 

[ADF L 

ADEF 

DEF 

ABEF 

ABCD 

BCEF 

ACEF 

ADEF 

ABEF 

BDEF 

ABCDEF 

BCDEF 

CDEF 


.41? ] 

CI> 

EF 

1 

.4 Cl? 

, £CjF 

2® in 16 blocks of 4 treatments I ADE 

BDE 

ADF 

BDF 

ABCD 

ABEF 

CDEF 

ABCDEF J 


Each set is typical of a number of exactly analogous ones which 
can be written down immediately by substituting the other treat¬ 
ment factors involved for those which are given in the table. If the 
experiment is a replicated one, a different group may be confounded 
in each replication, so that all the information on a particular group 
is not lost. The different groups need not be members of the one 
analogous set, but it is usually more convenient if they are so chosen. 






200 Barxakd —An Enumeration of the Confounded [No. 2, 

If, for example, the 2 5 design arranged in eight blocks of four treat¬ 
ments suits the experimenter's requirements, the use of five repli¬ 
cations will give a symmetrical layout in which one of the five 
analogous groups composed of two first-order, four second-order, 
and one third-order interaction may be confounded in each repli¬ 
cation. With this arrangement each of the first- and third-order 
interactions occurs once, and each of the second-order interactions 
twice, in the complete experiment, so that 1/5 of the information 
on each of the first- and third-order interactions, and 2/5 on each 
of the second-order interactions is lost. 

If the 2 6 design is arranged in eight blocks of eight treatments, 
and it is decided to confound analogous sets of comparisons composed 
of four second-order and three third-order interactions, it might be 
hoped that the use of five replications would give a symmetrical 
design. This, however, does not prove to be the case. The best 
that can be done with five replications is that one of the twenty 
second-order interactions shall occur twice, while a second does not 
occur at all, the remaining eighteen occurring once and once only. 
On the other hand, if the system is arranged in sixteen blocks of 
four treatments, it is possible with five replications to obtain a sym¬ 
metrical arrangement in which one of the analogous sets composed 
of three first-order, eight second-order, three third-order and one 
fifth-order interaction (the sets typified by that given in the last 
line of Table I) is confounded in each replication. In this case 
\ of the information is lost on each of the fifteen first-order inter¬ 
actions, f on each of the twenty second-order ones, on each of 
the fifteen third-order ones, while the fifth-order interaction is com¬ 
pletely confounded. 

There is, of course, no necessity for the arrangement to be 
balanced in the above manner; indeed, five replications of a 2 6 
design involve more material than an experimenter is likely to have 
at his disposal. In an agricultural field trial, for example, 320 
plots would be required, which implies a layout of greater magnitude 
than would commonly be feasible. This, however, does not detract 
from the interest of the more complicated designs. One of their 
chief uses is in making exploratory surveys of new material where the 
primary object is to ascertain which of a large number of possibly 
relevant factors exert effects of any kind, the interactions, with the 
possible exception of the first-order ones, being of secondary interest. 
In such cases even a single replication may provide valuable infor¬ 
mation, and if a small number of the first-order interactions are 
inevitably confounded, as in the case of the 2 6 arranged in 
blocks of four treatments, these may be allotted to these treatment > 
whose effects so far as can be judged a priori are least likely to interact. 



1936] An alignments in the 2x2x2... Factorial Designs. 201 


Rome, or all, of the higher-order interactions can be used as an 
estimate of error. 1 The partition of degrees of freedom in the 
analyses of variance of a single replicate of the 2 6 system, arranged 
in block* of eight treatments, and in blocks of four treatment*, is 
&riven in Table II (a and b). 


Table II. 

Partition of degrees of freedom for a single replication of the 2 b 
design arranged in eight blocks of eight treatments , and sixteen 
blocks of four 1 1 eat meats, respectively* 

(d) Eight blocks of eight treatments. (M Sixteen blocks of turn tieatment**. 


Variance due to 

Dearf es of 
Freedom. 

5 ai nice due to 

Dfjrtt - ot 
liet ion. 

Blocks . 

7 

Blocks .. 

15 

Mam effects . 

6 

Mam effects 

6 

First -order mteraetions .. 

15 

First-order mteiactions 

12 

Error . 

35 

E 11 or . 

30 

Total . 

63 

Total . 

63 


When a given amount of material is available, it is frequently 
better to use such designs as these, in order that as many factors 
as possible may be included, rather than to use several replications 
of a design involving fewer factors, thus necessitating the omission 
of certain factors which may be relevant. L. H. C. Tippett has re¬ 
cently described similar arrangements, utilizing the properties of 
five by five hyper-grseco-latin squares, which he used to investigate 
the effects of treatment factors, ignoring all interactions. 2 In all 
designs of this kind, preliminary investigation will show which of 
the factors suspected of being irrelevant may be ignored with 
confidence, and then, if the experimenter desires more information, 
replicated experiments can be performed with those retained. 

§ 5 , Summary, 

The aim of the paper is to enumerate the confounded arrangements 
possible in a factorial design of the 2 71 type, and to show how these 
may be obtained. 

In the introductory section a brief account is given of some of 
the practical applications of these designs. 

The second section deals with the structure of the 2 n system, 
and shows that the main effects and interactions are algebraically 
interchangeable, so that any set of signs in the schematic diagram 
given may be taken to represent any chosen set of treatment com¬ 
parisons, provided that certain restrictions, made necessary by the 




202 


The 2x2x2... Factorial Designs . [No. 2 , 1936 . 


relation existing between any pair of comparisons and tbe generalized 
interaction between them, are not violated. 

This leads to the general result given in the third section—namely, 
that if any one replication of a, 2 n system is divided into 2 k equal 
blocks, then k treatment comparisons can be arbitrarily selected for 
confounding, the choice being governed only by the above-mentioned 
restrictions, and the remaining confounded treatment comparisons 
are then fixed, being the generalized interactions between the 
original k comparisons. 

In the fourth section a table is given which shows, for some of the 
simpler designs of the z n type, typical sets of treatment comparisons 
which may advantageously be confounded in any one replication, 
and certain practical applications of these designs are briefly com 
sidered. 

My thanks are due to Mr. F. Yates (Chief Statistician, Rothamsted 
Experimental Station) for the assistance he has given me throughout 
in the preparation of this manuscript. 

References* 

1 Fisher, R. A., The Design of Experiments, Edinburgh: Oliver and Boyd, 
193o. 

a Tippett, L. H. C., Applications of Statistical Alethcxh to the Control of Quality 
in Industrial Production, Manchester Statistical Society, 1936, pp. 1-32. 



203 


INDEX 

TO THE 

INDUSTRIAL AND AGRICULTURAL 
RESEARCH SUPPLEMENT 

Vol. III., 1936. 


PAGES 


Application of statistical principles to an industrial problem. See 
Jennett (W. J.) and Dfdding (B. P.). 


Babnabd (M. M.}. Enumeration of the confounded arrangements in 

the 2x2x2 factorial designs. 

Structure of the 2« system ....... 

Ponfonnding in the 2* systems ....... 

Summary .......... 

Bartlett (M. S.}. The square root transformation in analysis of 

variance. 

Types of variation in practice ....... 

Variation *ith limited range ....... 

Discussion on binomial \ariat ion . ...... 

Summary .......... 

-Some notes on insecticide tests in the laboratory 

and in the field 

Laboratory experiments ........ 

Field experiments ........ 

Nummary .......... 


195-202 

193 

197 

201 


68-78 

71 

73 

7«“> 

7 * 


185-194 

185 

1K9 

194 


Cochran (W. (1.). Statistical analysis of field counts of diseased 
plants ........... 49-67 

Areal distribution of diseased plants ...... 30 

Distribution of groups of diseased plants in a row .... -">5 

Tost of significance of neighbour infm ion . . . . .00 

Analysis of later counts ........ 03 

Summary - 69 

Comrie (L. J.). Inverse interpolation ..... 87-94 

-Scientific applications of the National Accounting 

Machine.94-114 

Co-operation in large-scale experiments. Discussion on. Mr. (.losset; 

Mr. Fisher; Dr. Wishart; Mr. Yates; Dr. Hunter; Mr. S. Bartlett; 

Mr. Fairfield-Smith; Dr. Beaven; Sir D. Hall; Mr. M. S. Bartlett 113-136 
Correspondence between J. Neyman and F. Yates ... 83 


Distribution of “Student’s” ratio for non-normal samples. See 
Geary (R. C.}. 

Dfdding (B. P,). See Jennett (W. J.) and Dudding (B. P.). 


Enumeration of the confounded arrangements in the 2 x 2 :< 2 
factorial designs. See Barnard (M. M.}. 

Geary {R. C.b Distribution of “ Student’s ” ratio for non-normal 
samples........... 178-184 









204 


INDEX TO VOL. III. 1936 . 


G-oSset (W. S.). See Co-operation in Large-Scale Experiments. 
Could (C. E.) and Hampton (W. 3VL). Statistical methods applied 
to the manufacture of spectacle glasses ..... 

Description of procoss ........ 

Statement of the problem . . . . 

Relation between the seed and vein figui es and quality 
Correlation of seed and veins betw een pairs of pots 
Correlation between veins and seed in same pot 
Variation of se^d and veins during runs 

Relation of seel and veins to type of mixture and nn Tiber of cylinders 
Relation between seed and veins m different runs 
, Variation of refractive index 

Data relating to manufacture of spectacle glasses .... 
Discussion : Prof. Pearson; Mr. Welch; Dr. Gooding; Mr. Yates; 
Mr. Jennett; Mr. Gosset; Mr. E. D. van Rest 


PAGES 


137-150 

137 

13U 

140 

110 

101 

1U 

llfi 

US 

150 

151 


156-177 


Insecticide tests. See Bartlett (M. S.). 

Inverse interpolation. See Comrie (L. J.). 

Jennett (W. J.) and Dudding (B. P.). Application of statistical 

principles to an industrial problem ...... 1-12 

The problem ......... - 

Practical application of results ...... <» 

Discission : Prof. E. S. Pearson; Mr. P. Yates; Mr. Welch; Mr. 
Tippett; Mr. Bartlett; Authors’reply ..... 12-23 

National Accounting Machine. See Oomrie (L. J.). 

Scientific applications of the National Accounting Machine. See 
Comrie (L. J.). 

Specification of rules for rejecting too variable a product. See 
Welch (B. L.). 

Spectacle glasses. Statistical methods applied to the manufacture 
of. See Gould (C. E.) and Hampton (W. M.). 

Square root transformation in analysis of variance. See Bartlett 
(M.S.). 

Statistical analysis of field counts of diseased plants. See Cochran 
(W. G.). 

-methods applied to the manufacture of spectacle glasses. 

See Gould (C. E.) and Hampton (W. M.). 

-principles, application to an industrial problem. See 

Jennett (W. J.) and Dudding (B. P.}. 

Tests of significance in analysis of co-variance. See Wishart 
(John). 


Welch (B. L.). Specification of rules for rejecting too variable a 

product, with particular reference to an electric lamp problem . 29-4S 

Standard deviation distribution ....... 30 

Use of standard deviation as a criterion 32 

Influence of sample size . . 33 

Specification problem . . 35 

Relevant details of manufacture . 

Applications of the probability integral 40 

Comparison of different methods of sampling ..... 13 

Wlshart (John). Tests of significance in analysis of co-variance . 79-S2 



SUPPLEMENT 


TO THE 

JOURNAL 


OF THE 


ROYAL 

STATISTICAL SOCIETY 


Being the organ of the 

Indush ial and Agricultural Research Section of the Society , 
founded in 1933 . 


Vol. IV.—1937. 


LONDON: 

THE ROYAL STATISTICAL SOCIETY 

4, PORTUGAL STREET, W.C. 2. 


1937 . 



NOTICE 


The Council of the Royal Statistical Society wish it to be under¬ 
stood that the Society is not responsible for the statements or 
opinions expressed in the Papers read before the Society or 
published in its Journal and Supplement. 



Vol. IV —Year 1937. 


No. l, 1937. 

PAGES 

Statistical Method applied to Biological Assays. By J. 0. 

Irwin, Sc.D., D.Sc. . 1-48 

Discussion on the Paper. 49-60 

Some Considerations of the Variability of Cotton Cloth 

Strength. By A. W. Bayes, M.Sc.Tech. 61-80 

Discussion on the Paper... 80-93 

Notes on Some Statistical Problems raised in Mr. Bayes’s 

Paper. By E. S. Pearson and B. L. Welch . 94-101 

Problems arising in the Analysis of a Series of Similar 

Experiments. By W. G. Cochran, B.A. 102-118 

Significance Tests which may be applied to Samples from 

any Populations. By E. J. G. Pitman . 119-130 

Sub-sampling for Attributes. By M. S. Bartlett . 131-135 

No. 2,1937. 

Some Examples of Statistical Methods of Research in Agri¬ 
culture and Applied Biology. By M. S. Bartlett .. 137-170 

Discussion on the Paper... 170-183 

Examples of Statistical Methods in Forest Products 

Research. By E. D. van Rest, B.A., B.Sc. 184-203 

Discussion on the Paper... 203-209 

iii 












iv CONTENTS VOL. IV, YEAR 1937. , 

PAGES 

Application of Hollerith Equipment to an Agricultural 
Investigation. By L. J. Comrie, Ph.D., G, B. Hey, 

B.A., and H. G. Hudson, B.A. 210-224 

Significance Tests which may be applied to Samples from 
any Populations. II. The Correlation Coefficient 
Test. By E. J. G. Pitman ... 225-232 

Catalogue of Uniformity Trial*Data. By W. G. Cochran 233-253 

Index to vol. IV (1937) .. 255-256 






SUPPLEMENT 

TO THE 

JOURNAL OF THE ROYAL STATISTICAL SOCIETY 

Vol. IV., No. 1,1937. 


First Meeting op the Industrial and Agricultural Section, 
Session 1936-37, November 26th, 1936, Dr. Percival Hartley, 
C.B.E., M.C., in the Chair. 

The Chairman said that after a long experience of Scientific 
Societies he knew of no other which had adopted this altogether 
gracious custom of inviting a complete outsider to preside at its 
meetings. For himself, he deeply appreciated the honour that had 
been done him in asking him to take the Chair at this meeting, and 
he would like at the same time to thank the Society, on behalf of 
many friends and colleagues who were present, for its kindness in 
inviting as guests so many whose major interest was in the field of 
biological research. He could assure the Society that they were 
very grateful and glad to have this opportunity of hearing this im¬ 
portant paper by Dr. Irwin, which had such an important bearing 
on the different things which they, in their various ways, were trying 
to do. 

The following paper was then read. 

Statistical Method Applied to Biological Assays. 

By J. 0. Irwin, Sc.D., D.Sc. ( Division of Epidemiology and Vital 
Statistics , London School of Hygiene and Tropical Medicine). 

CONTENTS. 


Pvrt I—Principles. 

PiGE 

I. Introductory . 2 

II. Response a Continuous Variate. 4 

(a) Dosage-response curve previously obtained . 4 

(b) Obtaining a standard dosage-reponse curve . 9 

(c) Standard dosage-response curve not available. 9 

{d ) Previous estimates of slope available . 10 

SUPP. VOL. IV. NO. 1. B 











2 Irwin —Statistical Method [No. 1, 

PAGE 

III. Response Quantal (i.e. all or none) . 12 

(а) Transformation of response scale. 12 

(б) Obtaining a standard linear dosage-response curve . 13 

( c) Testing the line for goodness of fit . 14 

{d) The case of zero or 100 per cent, response . 15 

(e) Bliss’s numerical example. t . 16 

(/) Assay where a standard dosage-response curve is available ... 20 

(£) Assay where a standard dosage-response curve is not available 21 

IV. Approximate Methods . 23 

(a) Gaddum’s Discussion . 26 ( 

( b ) Consistency or Bias of the Estimates used . 27 - 

V, Special Methods. .. 31 , 


Part II—Applications, 

I. Introductory . 

II. Gas-Gangrene Antitoxin ... . 

(a) Intravenous injection into mice ... 

(b) Intracutaneous injection into guinea-pigs 

III. Vitamin C. 

(a) Changes in structure of Teeth . 

(b) Growth Test . 

IV. Vitamin D. 

(a) X-Ray Test . 

(b) Line Test .. . 

(c) Ash-content of bone test. 

V. Error of Vitamin Tests in General . 

VI. Antipneumococcus Serum . 

(а) Method I . 

(б) Method II. 

PART I.—PRINCIPLES. 

I. Introductory . 

This paper on statistical method in biological assays can lay little 
claim to originality. The pioneer work of Trevan (ref. 1 ), that of 
Gaddum (ref. 2), dealing particularly with quantal responses, the 
work of Bourdillon, Coward and their colleagues in the field of 
vitamins, of Marks (ref. 3), and Hemmingsen (ref. 4 ), with insulin, 
of Bliss with insecticides, to mention only a few names, have already 
evolved most of the statistical technique necessary. But I thought 
the time had come in which a somewhat more systematic presentation 
of the principles of the subject than had hitherto been available 
might be possible, and this I have attempted to carry out. There 
are still, I think, certain points of technique on which a difference 
of opinion is possible; on these I hope the paper will lead to a dis¬ 
cussion. The Sub-Com m ittee of the Pharmacopoeia Commission 
dealing with the accuracy of biological assays prefaced their recent 
report (ref. 20), with the hope that comments on it would be forth¬ 
coming. On the statistical side such comments are particularly 
the province of such people as are gathered here this evening. 


32 

32 

35 

36 
36 

38 

39 

40 
40 


42 

44 

44 

46 


















1937] 


Applied fo Biological Assays . 


3 


There are some therapeutic and other substances whose activity 
can only be tested by experiments on animals. The object of a 
biological assay is, in essence, to compare the potency of the par¬ 
ticular preparation under test with that of a standard preparation 
of the same substance. In the typical situation which arises, a 
dose (or several doses) of each preparation is given to a group (or 
groups) of animals and some specific response is noted. The assay 
is made by a comparison of the doses of the test and standard pre¬ 
parations which produce the same response. 

Since all animals, even offspring of the same matings, vary, 
the accuracy of the result will depend mainly on the variation of 
the response in the animals used. There will, however, be other 
sources of error: technical errors, errors of estimation, errors of 
grouping and other unknown errors. The most direct way of 
estimating the error of the result would be to obtain from different 
laboratories a number of independent estimates of the potency of 
the preparation to be tested. The variance of the results gives 
a direct estimate of the error under practical conditions, including 
all sources of error. This is sometimes the only method of approach, 
but it is seldom that enough different laboratories are available to give 
a good estimate of this kind. Besides, it is obviously desirable to 
analyse the different sources of error somewhat more closely. 

A less direct approach has therefore, as a rule, to be used, and 
much preliminary study may be needed (and is certainly repaid) 
in ascertaining (1) the relation of the response of a group of animals, 
receiving the same dose, to the dose administered, (2) the amount 
of variation in response among animals receiving the same dose. 

The response in an individual animal may be a continuous 
variate , as, for instance, the percentage of ash in the bone of a 
rat which has received a given daily dose of Vitamin D for four 
weeks, or it may be all or none, as, for example, death or survival 
after injection with a mixture of a culture and antipneumococcus 
serum. Thus the response of a group of animals, receiving the same 
dose, will in the former case be the average response of the members 
of the group, in the latter case it will be the percentage of individual 
animals in the group giving the characteristic response. 

The relation between the response of a group of animals receiving 
the same dose and the dose given need not necessarily be linear, 
and the variation in response of the group may or may not be the 
same at all levels of dosage. Where, however, the relation is linear 
and the standard deviation of response remains constant at all the 
levels of dosage used, the statistical problem is much simplified. 
These two desiderata can often, perhaps one could say generally, 
be secured by choosing (1) a suitable measure of response and (2) a 



4 


Irwin— Statistical Method 


[No. 1, 


suitable scale oi* which to measure dosage. This is the typical 
situation which has to be met; where the problem cannot be reduced 
to this form, some special method has to be used. We shall indicate 
later how the case of the all or none or qiianial response can be re¬ 
duced to this form, but we shall start by considering the case where 
the response is a continuous variate . 

77. The Response a Continuous Variate , the Relation between Response 
and the Logarithm of the Dose Linear , and the Standard; Deviation 
of Response Independent of Dosage-level. 

We have referred to the choice of a suitable scale on which to 
measure dosage. It is usually found that the response is linearly 
related to the logarithm of the dose, and not to the dose itself. 
When this is so the logarithm of the dose provides the suitable scale. 
In considering the relation between dosage and response, the dose may 
be expressed in terms of a standard-unit, where this has been defined,* 
or it may be expressed in terms of the amount by weight f (say milli¬ 
grams) of the substance given. The formulae for calculating the 
potency of a test preparation in terms of a standard and the error of 
the determination may be derived from two slightly different view¬ 
points, according to which of the two ways of expressing dosage is 
chosen. Both will be considered. The experimental procedure and 
the method of calculation will also vary according as a standard 
dosage-response curve has or has not been previously worked out. 

(a) The case where a standard dosage-response curve has previously 
been obtained . 

In this section we shall suppose that the relation between dosage 
and response has already been worked out for the standard prepara¬ 
tion and is of the form 

y^a + bx .( 1 ) 

where &= logarithm (to base 10) of dose (dose in standard units), 

* For example, an addendum to the Pharmacopeia, 1932, defines the unit 
of Vitamin A as follows :— 

1. Slumlord Preparation of Vitamin A. 

The Standard Preparation for Great Britain and Nod horn Ireland is a 
quantity of pure ^-carotene kept in the National Institute for Medical Research, 
Hampstead, London. The Standard Preparation for other parts of the British 
Empire is the same, except for those countries in which a similar standard 
preparation, kept in a different Institute, has been defined by law; in these 
countries the standard preparation, so defined, is used. 

2. The Unit of Vitamin A. 

The Unit of Vitamin A activity for Great Britain and Northern Ireland 
is the same as the international unit. It is defined as the specific activity con¬ 
tained in 0*6 microgram (0*6 y) of the Standard Preparation of pure ^-carotene. 
The unit for other parts of the British Empire is the same, except for those 
countries in which a similar unit has been defined by law; in these countries 
the unit, so defined, is used. 

f Or sometimes by volume. 




1937] 


Applied to Biological Assays. 


5 


y = response of the group receiving the dose x, y being the average 
response of the n animals of this group. 

The assay may, in this case, be performed by giving one dose 
of the standard and one of the test preparation to a group of animals. 
The dose of the standard is necessary to guard against any change 
in conditions that may have taken place since the time the response 
curve was constructed, for instance a change in average sensitivity 
of the animals used. 

Let the responses of the two groups of animals to the two doses 
be y v y 2 ; corresponding to the test preparation, y 1 to the standard 
preparation; x l9 x 2 corresponding to these responses can im¬ 
mediately be calculated from (1). The difference x 2 — gives 
on the log. scale the excess in potency of the dose of the test prepara¬ 
tion over that of the standard preparation, and antilog (x 2 — x x ) 
will be the ratio of the number of units in the doses given. We 
know how many units are contained in the dose of standard given, 
and how many milligrams (say) in the dose of the test preparation; 
thus it is easy to calculate the potency of the test preparation 
in units per gramme. 

We have 


*2 



( 2 ) 


If X x is the log-dose of standard (dose in mgrms., say) and X 2 
similarly the log-dose of test, then 

10* 1 standard units are contained in 10 Zl mgrms. of the standard 
preparation. 

10 Ta standard units are contained in 10 Ja mgrms. of the test 
preparation. 

Therefore 1 mgrm. of standard and test contain respectively 10 Ti, " Zl , 

IQtfa-jr* un ft s anc i 


1 ( P°^ enc y te st _\ 

® Vpotency of standard/ 


— &2 — $1 + -2Ci — 

= x 1 -x i + 


(3) 


’-an y 

To calculate the error of the determination, let a be the standard 
deviation (known in advance) of the response in a group of animals 
receiving the same dose, then the standard deviation of ( y 2 —- 2 / 1 ) 

IT ~T 

c VF 1 +^ = E ’ say 

where there are n v n 2 animals in the two groups. As a rule % = n 2 
and e = 


,-Ji 

\ n 


Hence the standard error of (x 2 — *,) is (e/b). 



6 


Irwin —Statistical Method 


[No. ], 


It is then usual to take some multiple of this, corresponding to 
a definite degree of probability. In recent work for the Pharma¬ 
copoeia Commission we have taken 2*576, corresponding to odds 
of 1 in 100. Then 99 times out of 100 (x 2 — Xj) will be within 
± (2*576e/&) of its true value. Thus antilog (2*57 6z/b) gives the 
upper limit of the potency ratio of the doses given and antilog 
— (2-57 6e/h) gives the lower limit. 

Let antilog (2*57 6s/h) = 1 + then we say the “ limits of error 
(P = 0*99) ” are { 100/(1 + p)} per cent, and 100(1 + p) per 
cent. 

For example, if p = 0*5, the limits of error are 67-150 per cent., 
meaning the result will, 99 times out of ioo, be within 67 and 150 per 
cent, of its true value. 

We may now obtain the above result from a slightly different 
point of view. Suppose X x * is the log-dose of standard, Z 2 the 



log-dose of test, both doses being measured in milligrams of standard 
and test preparation actually given (or at any rate the same unit 
by weight or volume of test and standard). Straight lines relating 
response to log-dose (measured in log-milligrams) may be drawn 
both for test and standard; they will both have the same slope 6, 
but will differ in position; the former passing through point 
(Z 2 , y 2 ) ; the latter through (X l5 yj. 

The equations to the two lines are 

Y- yi = l(X-X 2 )} 

Y-y^HX-Xjl • • • • (4) 

The horizontal distance between the lines gives us the logarithm 
of the potency ratio; since it gives the difference in log-dose for 
equal responses, 

* Where it is acase of comparing two preparations we have always used X 
for log-dose, by weight or volume. We use % for log-dose in “ standard units/* 
In dealing with a single preparation, the two differ by a constant and it is 
immaterial which is used. 



1937 ] 

When 7 = 0 , 


Applied, to Biological Assays . 


7 


X = Z 2 — ^ for Test Curve 
A = ^ for Standard Curve. 

Hence 


(4 bis) 


M 


= log (: 


potency of test \ 
potency of standard/ 


= z x -z 2 + 


Vi-Vi 


( 5 ) 


the same result as before. 

In the above determination of the error of the test, the value 
of b determined from previous experimentation has been supposed 
exact. We shall see below how the error of b may be taken into 
account, if necessary. It may be noted here that the error of b 
does not affect the accuracy of the result if the responses to the doses 
of standard and test administered are equal. 

For 


* 




s = t -|_ (it: 


2/i)V 


6 2 


b* 


( 6 ) 


and if — y 2 , the term in c b is without influence as the result. 

We may illustrate this section by a numerical example. One 
of the ways of assaying Vitamin D is to put about twenty young 
rats, each weighing 40 to 50 grammes, from three or four litters 
on a rachitogenic diet for about four or five weeks. During this 
period they are divided into two groups, each rat of one group 
having a litter mate in the other group. The rats of one group 
receive daily doses of the preparation being tested, and the rats 
of the other group receive daily doses of the standard preparation. 
At the end of the period the rats are killed and corresponding bones, 
e.g. femora or humeri , are taken from every rat. The percentage 
of ash in the dry extracted bone is calculated for each rat, and 
this is taken as the individual response. The response for a group 
of rats receiving the same dose is the average percentage of ash in 
the dry extracted bone for the members of the group. 

Now let us consider the following example: 0*77 mg. of a cod- 
liver oil and o*i unit of Vitamin D standard were given to groups 
of 10 rats, each rat in one group having a litter mate in the other. 
The average percentage of ash in the bones was 45*86 in the former 
case and 40*89 in the second. 

From a previous examination of 219 pairs of litter mates the 

* Gaddum uses A jp for this quantity. We have aye = a u where u = 
(y 2 — y x ) lb. Taking logarithmic differentials, ~ ~ • Squaring 

both sides and taking expectations, the result easily follows. 1 See also the foot¬ 
note to p. 10. 



8 


Irwin —Statistical Method 


(No. 1, 


value of tlie standard deviation of response in animals of tlie same 
litter was found to be 2*800 and the average value of b from a number 
of experiments utilizing 282 rats was 21*1. 

Here we have 

y 2 = 45*86 5 y 1 = 40*89, 
y 2 — y 1 ~ 4*97 

ar 2 - x x = (4*97)/(21*l) = 0*2355. 


Eatio of number of units in the doses of test and standard given 
= antilog (0*2355) = 1*720 

Hence 0*77 milligram of cod-liver oil contain 0*1720 unit of Vitamin 
D, or the cod-liver oil contains (i 72 *o)/(o* 77 ) units per gramme or 
223*4 units per gramme. 

For the error of the test 

2-8V2 3-960 


VlO 
1-252 
21-1 = 


V10' 
: 0-0593 


1-252 


2-576 ~ = 0-1528 
0 


antilog {^2-576 |j = 1-42 


or the limits of error are (70 — 142 ) per cent. 

We can be reasonably certain (Le. only wrong on the average 
once in 100 times) that the above result is between 70 and 142 per 
cent, of its true value. 

If we wish to find the actual potency ratio, we must know X v 
Actually, 0*1 unit of Vitamin 1) were contained in o*i milligram 
of the standard preparation, then, 

X x = log ( 0 * 1 ) = 1*0 


X 2 = log (0*77) = 1*88649 


X 1 -Z 2 = 1*11351 
y± Y^ 1 ) = 0-2355 

lo / potency of test preparation \ = 

° Vpotency of standard preparation/ 

Potency Eatio = 0*2234. 

Now, our standard preparation contains xooo units of Vitamin D 
per gramme, so our cod-liver oil contains 223*4 units per gramme 
as before. 



1937 ] 


9 


(b) Obtaining the standard dosage-response curve . 

The dosage-response curve may be obtained in the usual way 
by least squares. 

If y = response of group of animals receiving same dose, 
x = log-dose, 

n — number of animals in group, 

_ S{ny(x - £)} 

0 S{n(x - *)*} .W 

and the standard error of 6, obtained from such a curve, will be 
given by: 

° b ${n(cc — x) 2 } .^ 

the summation S extending over the different doses used. 

This value of a b may be substituted in equation (6) in cases where 
a standard curve is applicable. In the example we gave above 


P = ( 0 - 0593) 2 = 0*00352 


b 2 ' 


( 0 - 2355) 2 = 0 - 0555 . 


When ((*b/b) = io per cent., 20 per cent., 30 per cent., respectively 
we find for 0-1646, 0-1953, 0*2378 and the corresponding 
limits of error (P = 0*99) are (68-146 per cent.), (64-157 per cent.), 
(58— 1 73 per cent.). Thus, in this case, the error of the slope begins 
to affect the accuracy of the result when (0 b jh) exceeds 10 per cent. 

It sometimes happens that dosage-response curves for a standard 
preparation show significant changes in position, when constructed 
at different times. This may be due to different batches of animals 
having different sensitivities, but the trouble may always be guarded 
against by using one dose of the test preparation and one of the 
standard preparation, each time an assay is performed. What 
is more troublesome, it may happen that such curves show significant 
changes in slope. In such cases a standard curve can hardly be 
used at all, the slope must be determined each time an assay is 
performed; this necessitates at least two doses of the test and two 
of the standard preparation. 

We now go on to consider this case. 


(c) A standard dosage-response curve not available . 

Let us suppose a number of the doses of the standard and test 
preparations are administered to groups of animals. Let 
X x = logarithm of the dose (by weight or volume) for the standard 
preparation. 

X 2 = logarithm of the dose (by weight or volume) for the test 
preparation. 

y x = average response of the n x animals receiving the dose X x . 
y 2 ~ average response of the n 2 animals receiving the dose X 2 . 

b 2 




Irwin— Statistical Method 


10 


[No. 1 , 


Two straight lines having the same slope h ; but differing in position, 
are fitted by least squares to the observations (cf. Fig. 1). 

Let 


Y — *l (*l*l) 
1 


yi = 




x _,7_ ^2(^/2) 

2 ^2(^2) 2 $2(^2) 


( 9 ) 


the summations S 2 being over the different doses of the standard 
and test preparations respectively. Then the equation to the two 
straight lines, giving, for this particular assay, the dosage-response 
relations for the standard and test preparations are 

r-y 1 =6(z 1 -2 1 )l 

Y -§ 2 = b(X 2 -X 2 )j ‘ ' ' ' ' 1 


.. j £Ti{ihyi(Zi *^1)} +■ & 2 { n 2 y 2(^-2 -^2)} 

S 1 { ni (X 1 -X 1 )^ + S 2 {n i (X 2 -X 2 )^ ’ 

(10 bis) 

The horizontal distance between the two straight lines 
before, the logarithm of the potency ratio, hence 

gives, 

M = log( P^y of test \ * X 2 + ^~^ 

0 Vpotency of standard/ 1 ^ b 

• (11) 

and, as before, we have 

» 

. • - e2 1 (& - wv 

Cu 6 4 • • • 

where now 

• (12) 

£ ~ a *{s i (n 1 ) + S 2 (n 2 )) ■ ■ ■ 

• 03 ) 

„ ° 2 

S * - - Z a ) 2 } -1- S 2 {» 2 (X 2 - z 2 n • 

■ (W) 


a 2 is, of course, the variance in response of animals receiving the 
same dose, and is estimated in the usual way by taking the sum of 
squares of deviations from the dose means and dividing by 
(&i( n i) + $2(^2) — r )> where r is the total number of doses used.* 
A numerical example of a process, essentially similar to this will be 
given later (pp. 24 - 25 ). 


(d) A number of previous estimates of slope available . 

When experiments performed at different times or on different 
batches of animals show real variations in slope, it may be preferable 

* Often the experiment will be arranged with n constant and the same 
number of doses of each preparation. If so, and it is arranged that each anim al 
on a dose of one preparation has a litter mate on a dose of the other the cr 2 in 
(13) will be the variance in response of litter mates receiving the same dose, 
while the cr® in (14) will be the variance in response of non-litter mates receiving 
the same dose. The former will usually be smaller, and if a previous estimate of 
it is available, it may be used. 



19371 


Applied to Biological Assays. 


ir 


to use an average value of 6, derived from past experience, rather 
than the 6 derived from one particular assay. In this case equations 
(11) and (12) may still he used, but the average value of b must 
be inserted in (11) and a b will be the error of that average. This 
may bo calculated from the following considerations. 

Suppose N values of b obtained from different experiments 
are available. 

Then 

b — B-\-e 

where b is the observed, B the true slope for a particular experi¬ 
ment and e an error of estimate, due mainly to the* sampling variance 
of the animals. 

The total variance of 6 is given by 

7(6) = 7 ( 5 ) + E .( 15 ) 

where V(B) is the real variation in slope, E the variance of estimate. 
And if 6 is the mean of the N observed values 

V(b) = V(B) + | 

= F(J)-£ + |.(1G) 

a b = V7 (6) should be substituted in equation (12). 

The value of 7 ( 6 ) may be estimated from the N observed slopes, 
and for E the mean variance of estimate from the N experiments 
may be used. It is a moot point whether any weighting should 
be used in calculating 7(6), this can, I think, only be decided for 
the particular set of data considered. Often, the difference between 
weighted and unweighted variances will be small.* 

Thus in cases where a standard curve cannot be used, there 
are two alternative procedures possible in calculating the error 
of a particular assay. The value of 6 obtained from that assay 
may be used, with its appropriate error of estimate calculated 
from the same experiment, or an average value of 6 based on past 
experience may be used with its appropriate error. That method 
of calculation will be preferable which leads to the smaller final 
error. 

In the case of the assay of Vitamin D by the ash content of 
bone method, real variations in slope undoubtedly exist. Values 
of 6 varying between 14 and 28 have been found, but there is not 
enough data available to obtain a good estimate of the errors of 
the average slope. In such cases as this, errors for the Pharma¬ 
copoeia have been calculated on the assumption that the doses 

* In the case of antipneumoeoocus serum considered on pp. 44-48 un¬ 
weighted variances have been used. 



12 


Irwin —Statistical Method 


[No. 1, 


of standard and test preparation are adjusted so as to produce the 
same average response, in which case the error of 6 does not affect 
the result. They will never produce exactly the same response, and 
it has been pointed out that any device used to allow for the inequality 
introduces an additional error, which is usually small. In the case 
of Vitamin D the low value b = 14 has been used for the Pharma¬ 
copoeia. This increases the error and should tend to compensate 
for any effect of the error in slope, not otherwise allowed for. 

III. The Response Quantal (i.e. all or none), and the Distribution of 
the Logarithms of Individual Effective Doses , Normal . 

Here, if a series of doses of a preparation are given to groups 
of a nim als, the response of any group is the percentage of animals 
showing the characteristic effect. If the percentage of animals 
showing the characteristic effect is plotted against the logarithm 
of the dose, a sigmoid curve will in general result: in order to get a 



straight line, with much simplification of the subsequent analysis, 
a special transformation of the scale on which response is measured 
becomes necessary. 

(a) Transformation of the response scale to obtain a linear dosage- 
response relation . 

The smallest dose which will produce the characteristic effect 
in a particular animal can be called the individual effective dose . 
If the characteristic is death, this is called the individual lethal dose . 
If we have a population of animals, the individual effective dose 
will vary from animal to animal. The distribution of individual 
effective doses in such a population may be represented by a frequency 
curve in which the abscissa represents dosage and the ordinate is 
proportional to the number of animals which have the corresponding 
individual effective dose. Such a distribution is represented in 
Fig. 2 . If a dose represented by x be given to a number of animals, 
the effect will be produced in those animals whose individual effective 
doses are smaller than x, the proportion in which we should expect 
the effect would therefore be represented by the ratio of the shaded 
area to the whole area of the curve. 



1937] 


Applied to Biobgical Assays . 


13 


In the majority of the cases dealt with the logarithms of the 
individual effective doses are found to be normally distributed (we 
shall see later how this assumption is verified). 

We now take x to be the logarithm of the individual effective 
dose, m to be its mean value in a large population of animals, and 
X its standard deviation, and we suppose x to be normally distri¬ 
buted. The quantity 

Y = .( 17 ) 

is called the normal deviate corresponding to x , or the normal equi¬ 
valent deviation . Y is related to P, the proportion of animals showing 
the characteristic effect when given the dose x , by the equation, 

p= fj%‘ rm ■ ■ ■ ■ (i8 » 

Consequently, if P is known, Y may be calculated from it, and 
if m and X are also known, we have a linear dosage-response relation 
between x and Y . Such a relation may often be obtained for a 
standard preparation. 

(b) Process of obtaining a standaid linear dosage-response relation . 

If a series of doses be given, each to a different group of animals, 
the proportions of animals responding, ( p ), being noted in each case 
and the normal equivalent deviations, ( y ), calculated from them, 
when the normal equivalent deviations are plotted against the 
logarithms of the doses we should get a straight line. 

Of course, we shall not obtain a straight line exactly ; errors of 
sampling will affect the result—in fact, y will differ from Y, its ex¬ 
pected value, and the problem will be to fit a straight line to the 
observed points. When the straight line has been fitted, it must be 
tested for goodness of fit; the scatter of the observed points about it 
should not be too great to be due to errors of sampling. If this 
test is satisfied, so is the assumption of the normality of the distribu¬ 
tion of the logarithms of the individual effective doses. 

The line is fitted by least squares, but the points do not all have 
equal weight even when the same number of animals are used in 
each group. This is the only real difference between this case and 
the case when the response is a continuous variate, already described. 
In fact the standard deviation of p is given by 

c v = VPQTn .( 19 ) 

and from equation ( 18 ), we derive the relation 

8 P =~e-ir' 8 y= z $y .... (20) 




14 


Irwin— Statistical Method 


[No. 1, 


in the customary notation, whence 

% 2 = ( g p 2 /Z 2 ) = (PQInZ 2 ) . . . . (21) 

The weight of the observation y is therefore 

W=nZ 2 /PQ = nw .( 22 ) 

and the straight line should be fitted by minimizing 
S{W(y-Y )*}. 

The values of y corresponding to given values of p may be obtained 
from Sheppard’s Tables (ref. 5 ). A very convenient table for this 
particular purpose is also given by Bliss (ref. 6). The value of the 
normal equivalent deviation will in practice always lie between 
+ 5 and — 5. Bliss therefore adds 5 to the normal equivalent 
deviation and calls the unit thus obtained a probit . By this means 
he is always dealing with positive quantities, which is a convenience 
in numerical calculation. 

If the straight line be written 

Y-y=b( x -x) .( 23 ) 

we have 

Sinwy) _ Sinwx) , S{nwy(x — x)\ 

with w = (Z 2 /PQ) .( 24 ) 

Strictly speaking, the weights should be calculated from the 
true values P and Y, but we do not know these. We therefore 
calculate them from the observed values p , modifying them after¬ 
wards, if necessary, from the values given by the fitted line, and 
thence deducing a second approximation to the line. 

(c) Testing the line for goodness of fit 

The sum of the squares of the deviations of the observed points 
from the fitted line may be written, 

S{mo(y — y) 2 } — b 2 S{mv(x — x) 2 } 

= S{nw(y — y) 2 } — bS{nw(x — x){y — y)} 

= L Snwy 2 — yS(nwy)] — b[S(nwxy) — xS(nwy)] . ( 25 ) 

the last form being the most convenient for numerical computation. 

If the logarithms of the individual effective doses are in fact 
normally distributed, then the dosage-response relation, in this form, 
is linear, and the above quantity will be distributed in sampling 
as x 2 with ( r — 2) degrees of freedom, where r is the number of doses 
given. Thus we can use a table of x 2 to test whether our hypothesis 
is true or not. If the hypothesis is true, then a y 2 = l/(mo), and 
we can use the formula for the variance of a weighted mean, to find 
the standard errors of y and b. We find 

a v =s 1 jVS(nw), v b = l/VS{nw(x — xf) . ( 26 ) 




1937] 


Applied to Biological Assays, 


15 


If the hypothesis is not true, we cannot use a y 2 = l[(mo) for 
the variance of y. We must make an estimate of its variance from 
the data itself. The variance of a single weighted deviation 
(y — Y)l(Vl/nw) will be estimated as xV( f "“*2), where x 2 is 
equated to the expression ( 25 ). Then the variance of y will be 
y?j{nw(r — 2)}, and so on this case 

Vy}j{(r-mnw)}, c b = Vy?![{r - %)S{nw{x - xf}] (271 
If the normal hypothesis is true, these standard errors are reconciled 
with those above, owing to the fact that the average value of 
X 2 /(r — 2 ) in sampling is unity. 

Of course, if the x 2 test shows that the normal hypothesis is 
not true, the straight line does not tell the whole story. If the 
divergence is sufficiently great to be important, we may have to use 
a curvilinear relation between the normal equivalent deviation and 
the logarithm of the dose, or some other special method. 


(d) The case of zero or ioo per cent, response to a given dose . 

A difficulty which sometimes arises in fitting a standard curve 
is that it may happen that none of the animals on the smallest 
dose respond to it, or all the animals on the largest dose do so. In 
this case the corresponding normal equivalent deviation is negatively 
or positively infinite. If these two doses are ignored, the fitted 
line will be too flat. How are they to be taken into account ? E. A. 
Fisher (ref. 7) has given us the solution to this problem. Consider 
the case of ioo per cent, response. In this case the observed value, 
p , is unity, and q = 1 — p } is zero, the observed y becomes infinite 
and the corresponding weight zero. In fact, 


W a (nz 2 /pq) where q = 1 



( 28 ) 


Now, it may be shown that for large y } 

* 

2 ~ yV 2 .n y 
and z~qy\ 
or z z ~ qyzf 

Thus 


( 29 ) 

( 30 ) 


(z 2 /q) ~yz =s (t//V 2 Tc) e~* yl . . . ( 31 ) 

and 

W = (nz 2 /q) — >■ 0 as y — > co . . . ( 32 ) 

Thus if we use the observed weights, we get an infinite deviation 
with zero weight which we cannot take into account directly in 


* The symbol ~, which is read “ is asymptotically equivalent to/’ indicates 
that the ratio of the two quantities on either side of it tends to unity. 




16 


Irwin— Statistical Method 


[No. 1 


fitting the line. We can derive the appropriate weight approxi¬ 
mately by fitting a provisional straight line to the other observations, 
but how shall we get the right value of y to use with it 1 We can 


do this by using the equation 
P = 



where P is the expected proportion of responses, and Y the true 
normal equivalent deviation. If n is not too small, 

$p = Zly 

or 

Q- S -=Z$y .( 33 ) 

where Q = 1 — P and 5 is the number of animals out of n receiving 
the dose in question, who do not respond. 

This means that the average value of the deviation of y from its 
expected value, when there are s animals out of n who do not 
respond, is 



and when 5 = 0, this is Q/Z. 

Thus the average value of y, under these circumstances, is 
Y+ - 

We approximate to Y by fitting a provisional line to the other 
observations, calculate the corresponding values of Q and Z, and 
so get an approximation to the value of y, which should be used with 
its appropriate weight in fitting the final line. 

Bliss (ref. 8), gives tables of q/z, y (q/z), lOOp, and (w = z 2 /pq) 
at o*i intervals of the argument y, which greatly aid the numerical 
calculations. Gaddum (ref. 9 ) gives nomograms for deriving y 
from p or vice versa, and w from p or vice versa.* 


(e) Bliss's numerical example. 

A very detailed numerical example of the above process has 
been given by Bliss (ref. 10). As the method involves only well- 
known statistical procedures, and we give later a numerical example 
of an assay where response is quantal but a standard curve is not 
used, we shall not give a full example here. But there are one 
or two points arising out of Bliss's example which are worth dealing 
with. Bliss's data concern the flour beetle Tribolium confusum 

* Some differences in notation may be pointed out. Gaddum uses B for 
our w. Bliss uses w for our ( W = nw). Gaddum’s notation has been used in 
the recent J Report No. 10 of the British Pharmacopoeia Commission Reverts of 
Committees (Aug. 1936). 




1937] 


Applied to Biological Assays. 


17 


and give the percentage of insects killed following 5-hour exposures 
to known concentration of carbon disulphide. We reproduce the 
part of Bliss’s table which gives his primary data. 

Bliss had two series of insects on the same dosages; the insects 
in both series on the two doses of lowest concentration showed an 
exceptionally high mortality, and he treated the data by fitting 
separate straight lines to the lowest 3 concentrations and to the 
3rd~8th. We shall be concerned only with the second straight 
line, and shall not consider the lowest 2 concentrations at all. 
In general, the two series would be fitted by separate straight lines 

Table I. 

Bliss's data . 


Series No. 

Total No. 
of Insects. 

C.S,. mg. 
per Litre. 

% kill = lOOp, 

Log. of 
dosage => x. 

Probit Corre¬ 
sponding to 
lOOp, = y. 

I 

29 

49*06 

6*9 


3*517 


30 

52*99 

23*3 

1*7242 

4*271 


28 

56*91 

, 32-9 

1*7552 

4*557 


27 

60*84 

51*9 

1*7842 

6*048 


30 

64*76 

76*7 

1*8113 

5*729 


31 

68*69 

93*6 

1*8369 

6*522 


30 

72-61 

96*7 

1*8610 

6*838 


29 

76*54 

100*0 

1*8839 

7*952 

II 

30 

49*06 

13-3 

1*6907 

3*888 


30 

52*99 

20-0 

1*7242 

4*158 


34 

56*91 

26*5 

1*7552 

4*372 


29 

60*84 

48*3 

1*7842 

4*957 


33 

64*76 

87*9 

1*8113 

6*170 


28 

68*69 

86*7 

1*8369 

6.067 


32 

72*61 

100*0 

1*8610 

7*447 


31 

76*54 

100*0 

1*8839 

7*952 


and a test of significance performed to see whether they really 
differed or not. Perhaps Bliss did this; in any case inspection 
does not suggest any consistent difference, and one straight line 
was fitted to both series. He might have combined the two sets of 
animals at each dosage; however, he chose to keep them separate. 
The latter procedure is probably preferable, as he has more degrees 
of freedom for the calculation of his estimate of error. 

Bliss’s procedure was to fit a straight line by eye to the nine points 
remaining after excluding the 100 per cent, responses, read off his 
probit values from this line, and to obtain the weights from these 
probit values. For the three points corresponding to zero survivors, 
he obtained his <c fictitious ” y values as explained above. He then 





18 


Irwin —Statistical Method 


[No. 1, 


fitted his final line. Theoretically, this process of approximation 
should be repeated, re-calculating the weights from each approxima¬ 
tion to obtain the next approximation, until two approximations 
agree. Actually one approximation and one final line are usually all 
that is required. 

Gaddum (ref. 11) recommends using the weights corresponding to 
the observed values in fitting the line. The former procedure seems 
to me theoretically preferable, though I should have used Gaddum’s 
method rather than a line fitted by eye, as the first approximation. 
However, it is of some interest to see whether the alternatives differ 
very much in practice. 

Accordingly, Gaddum’s method was used, and the line fitted by 


Table II. 

Comparison of Weighting Coefficients. 


Dose. 

Weighting Coefficients Calculated Irom :— 

Provisional Lino by Least 
Square s (0 addum). 

Provisional Lino Graphi¬ 
cally by Inspection 
(Bliss). 

56*91 

0-566 

0-555 

60-84 

0-633 

0-633 

64-76 

0-519 


68-69 

0-322 


72-61 

0-154 

0*125 

76-54 

0-056 

0-040 


least squares to the nine points. The equation to the line was found 
to be, 

Y = 5’333 + 23 * 31 (a; — 1 * 7937 ) . . . ( 34 ) 

Y being measured in probits. 

Bliss does not give the equation to his provisional line fitted by 
eye, but the weighting coefficients w calculated from (34) may be 
compared with Bliss’s weighting coefficients. 

The comparison suggests that Bliss’s provisional line was a 
remarkably good shot, and that such lines can be used by those who 
can work graphically with equal accuracy. 

Using the above weights from the provisional line fitted by least 
squares, and allowing for the ioo per cent, responses by the method 
explained above, a second approximation may be obtained and 
compared with Bliss’s second approximation and with ( 34 ). 

We find, 

F = 5*491 + 25 - 15(3 - 1 * 7985 ) . . . ( 35 ) 






19 


1937] Applied Lo Biological Assays. 

while Bliss’s second approximation, which he takes for the final 
line is, 

¥ = 5*450 + 25*15(a3 — 1-7967) . . . (36) 

It is more instructive to compare the three sets of expected probits 
(i.e . 9 the values of Y calculated from the fitted line). 


Table III. 


Dose. 

Values of 7 calculated from :— 


(34) 

(35) 

(36) 

56-91 

4-436 

4-402 

4-391 

60-84 

5-112 

5-132 

5-131 

64*76 

5-744 

5-813 

5-822 

68-69 

6*340 

6-457 

6-475 

72-61 

6-902 

7-063 

7-090 

76-54 

7-436 

7-639 

7-674 


The standard error of any fitted value is 

V af + (a? — x) 2 a b 2 .(37) 

This has a minimum value of Vl/S()iw) when x — x, and this is 
about 0*085. The largest difference between the two final approxima¬ 
tions is 0*03, and this is quite negligible. The largest difference 
between the first and second approximation is 0*20, and this is not 
sig nifi cant. It thus appears that using the weights calculated from 
the observed values of y will often be a sufficiently accurate process.* 


* There is one point in Bliss’s treatment with which I cannot agree. In 
performing the x 2 test for the fit of the straight line, in cases where there is no 
insect or only one killed ot surviving at a particular dosage level, all such doBes 
are considered to contribute collectively only one degree of freedom. In the 
above example, for instance, there are 4 such doses, and accordingly Bliss enters 
the x a table with 12 — 2 — 3 = 7 degroos of freedom. If ho does not do this, 
he says, “ the apparent goodness of fit will be exaggerated by the inclusion of 
observations which, because of their small weight, contribute little to the 


observed x 2 *” 

Ho appears to mo to forgot that though those observations have little weight 
they will have a correspondingly greater deviation from the line, in fact, the 
purpose of weighting is precisely to secure that the contribution of each point 
to x 2 shall bo the same. In fact, we may write 




y-T\* 

J 


in which each term clearly has unit weight. 

When p is near 0 or 1 , a difficulty does arise boeauso (i) the formula 
w = (Z 2 /PQ) is only approximate, and (ii), because the distribution of y 
calculated from the formula 

tV 1 

P « / 

I v2i t 



20 


Irwin —Statistical Method [No. 2, 


(f) Carrying out an assay where a standard dosage-refonse curve is 
available . 

The assay may be made by using one dose of the standard and 
one of the test preparation (if we were quite sure that the position of 
the standard curve at the time the assay was made was applicable, 
the dose of standard could be dispensed with). The statistical 
technique is now precisely similar to that described in Part I, Section 
II (a). The only difference is the difference in weighting. In the 
notation of that section, 


M—-1 / potency of test \ __ -v 

~~ \potency of standard/ ~~ 1 


■ + 


y*-yi 


( 38 ) 


It is assumed that the slope of the standard curve is known without 
error, hence 




+ cy , 2 

1 

ML, 

1 

b 2 

~b 2 ' 


r 


with Wi= («lZi 2 /Ml), w 2 = (h^/PA) 


( 39 ) 


and limits of error are calculated as in Part I, Section II (a). 


may not be normal. But there is no evidence that the net effect is to reduce the 
contribution to * 2 . 

Not much can be learnt from the value of x 3 in one particular case; but 
it is of some interest to examine the contributions to x 2 of the 12 points in 
Bliss’s experiment. We have :— 


y wm Probit Corresponding 
to Observed Deaths. 
(Bliss’s Values.) 

I r = Value from 
Pitted Line. 
(Bliss’s Values.) 

or - ly. 

IP ss« nte. 

ir(y- ry. 

4*557 

4*391 


15*5 


5-048 

5*131 

isHr ill 1 ; tH 

17*1 

mmS ];9f| 

5*729 

5*822 

IlK 


E * 3 

6*522 

6*475 


9*1 

sw 

6*838 

7*090 


3*8 

Wm*S& ■§§ 

7*952 

7*674 

0-0773 

1*2 

1 ^ fi" Wm. 

4*372 

4*391 


18*9 


4*957 

5*131 


18*4 


6*170 

5*822 

0-1211 

16*5 

1*998 

6*067 

6*475 

0*1665 

8*2 

1*365 

7*447 

7*090 

0*1274 

4*0 


7*952 

7*674 

0-0773 

1-2 





X 2 ' 

= 5*559 


It does not appear that the points with low weight contribute less to x 2 than 
those with high weight. The highest weight of all (18-9), gives the lowest 
contribution, only (0*007), while the highest contribution (1*998) is given by a 
weight of 16*5, little less than 18*9. The coefficient of correlation between weight 
and contribution to x 2 is 0*19 or quite insignificant. 















1937] Applied to Biological Assays . 


21 


(g) Carrying out an assay wlme a standard dosage-response curve is 
not available. 


The statistical technique is now precisely similar to that of Part I, 
Section II (c). In the notation of that section, the slope 6 will be 
given by 

& = -X i)} + S 2 {n 2 w 2 y 2 (X 2 -Z 2 )} 

$iW w i(-^i -^-i) 2 } + S 2 {n 2 w i (X 2 Z 2 ) 2 } 

with 

x - ,7 s i( n i w iyi) 

1 'Si( )t i w i) ’ Vi BjinjOi) ’ 

X S 2 (n 2 w 2 X 2 ) S 2 (n 2 wtf 2 ) 

2 S 2 (n 2 w t ) ’ y2 £ 2 (w 2 w 2 ) ’ 

Wi - w 2 z ={' n #z!'P£i) • (40 bis) 


The weights are calculated in the first place from the observed 
values of 2/; they may subsequently be adjusted when a first approxi¬ 
mation to the line has been fitted, and a second approximation 
obtained from the adjusted values. 

The adequacy of the linear relation may be tested as explained 
in Section III (c), by calculation of, 

x 2 ^S{nw(y-y) 2 }~b*S{nw{X--Xf} . . ( 41 ) 


the summation being over all doses of both preparations, and 
entering the x 2 table with (r — 2) degrees of freedom where r is the 
number of doses. Provided the relation is found to be linear, we 
have, 


M = i 0 g ( potency of test \ = 
8 Vpoteney of standard/ 

- y 1) V 


z l -z a + 


Vi-Vi 


0l ,2 q ?. 2 + Cj , 2 , (£2 

-r 


b ‘ 


= l/_ 1_ | ■ 

l 2 '-S 1 (n l w 1 ) ~ ti 2 (n 2 w 2 ) 


} + 


yi)V 

b* ' 


( 42 ) 


If the error of the slope is calculated from the evidence provided by 
the assay itself, 

° 4 * = s x K Wl (Z x - z x ) 2 }+ ,s>^ 2 (Z 2 - z 2 ) 2 } ■ ‘ (43) 

If the x 2 test shows a small, but significant, departure from linearity, 
which we regard as too small to lead to the abandonment of the 
linear form, it will be preferable, as previously explained, to substitute 
X 2 /(r — 2 ) for unity where it occurs in the expression for and c 6 2 . 

As explained in Section II ( d ), it may sometimes be preferable to 
use previous estimates of slope in the calculation of cr 6 2 . 

Gaddura (ref. 2 , p. 31 ) has given the simple expressions to which 
equations ( 40 ), ( 42 ), and ( 43 ) reduce when only two doses of each pre- 



22 


Irwin —Statistical Method 


[No. 1, 


paration are used and tlie same two doses by volume of each prepara¬ 
tion are administered, when each dose is given to the same number (//) 
of animals, and the weighting coefficient w may he assumed constant 
throughout. The last assumption is often sufficiently accurate 
since w has its maximum value of 0*637 when there is a 50 per cent, 
response and has only decreased to 0*490 for a 20 per cent, or 80 
per cent, response (0*5 is a sufficiently accurate value for many 
purposes). In this case, if X 1V X l2 are the two log-doses of the first 
preparation, X 2V X 22 of the second, if y iv y 12i y 2V y 22 are the 
corresponding normal equivalent deviations or probits, we have 

Xu = X 2 i, X 12 = X 22 , X 12 Xjj = X 22 X 2 i = d (say), 


and we find 


& 2 d ^ ia ^ 22 


^ 2 = 


nwd 2 


^ _ j / potency of 2nd preparation ^ 
~~ ® \ potency of 1st preparation / 

— 2/21 ~h y%2 Vi i ~~ Vi2 
~ 2 b ‘ ' ’ 

CM= ^\ 1 + w ) • • 


( 44 ) 

( 45 ) 


( 46 ) 

( 47 ) 


We may illustrate this section by an experiment in assaying anti¬ 
pneumococcus-serum Type I (Wilson Smith’s data). Five doses of 
two sera, $32 and $37, were given together with a test-dose of the 
culture to batches of 40 mice. The test dose was 0*01 c.c. of a 
17-hour-old culture of pneumococci, and contained (2130 X 10 6 ) 
cocci per c.c. The serum doses were 0*0004375, 0*000875, 0*00175, 
0*0035, an ^ °‘ 00 7 c *°* each serum. The mortalities were as 
follows:— 

Table IV. 


£37. 


Serum Dose (c.c.) 

Total Deaths 
per Group. 

Serum Dose (c.c.). 

Total Deaths 

Per Group. 

0*0004375 

40 

0-00175 


0*000875 

38 

0-0035 

! 38 

0*00175 

26 

0-007 


0*0035 

10 

0-014 

21 

0*007 

6 

0-028 

10 


It is convenient to work with the number of mice protected rather 
than those killed, so as to have increasing response with increasing 






19371 


Applied, to Biological Assays. 


23 


dose, and since eacli dose is double the previous dose, it it simpler 
to 'work with, logarithms to the base 2. We may then take X = 
1, 2, 3, 4, 5 for both sera and the potency ratio ($32/^37) finally 
obtained is divided by 4, since the doses of S3 2 are four times the 
corresponding doses of $37. 

Table V shows the details of the calculation. As a first approxi¬ 
mation separate straight lines were fitted to the data for each serum 
separately, using the weights obtained from the observed responses. 
The weights were then recalculated from the fitted values ( 7 ), and 
modified weights used in the second (final) approximation. 

In the first approximation each line was calculated with its own 
slope, as the separate slopes were required for another purpose; in 
the second approximation the lines were fitted with a common slope 
according to equation ( 40 ). Had the first approximation been fitted 
with lines of common slope, a slight modification of the weights used 
in the second approximation would have resulted, but the difference 
would not materially affect the final result. 

The probits in the first approximation have been obtained from 
Bliss’s Table I (ref. 6); for the first dose, which gives a zero response, 
the value of y in the second approximation is obtained by looking up 
q/z corresponding to (10 — 2*8123 = 7*1877) in Bliss’s Table II 
(ref. 8), and subtracting the result (0*3937) from 2*8123. This gives 
2*4186 and the corresponding weighting coefficient obtained from 
Bliss’s Table III is (0*0416). If Bliss’s tables are not available, any 
table of the ordinate and area of the probability integral may be 
used. 

Equations ( 40 ), (40 bis), ( 42 ) and ( 43 ) give all the formulae 
required. When the two final parallel straight lines have been 
obtained, it is convenient in practice to calculate the values of X 
corresponding to the median effective doses (Le. } those which produce 
50 per cent, of responses), for which the normal equivalent deviation 
is zero, or the probit 5. The difference between those gives the 
logarithm of the potency ratio. In the present instance, $32 has 
roughly one-fifth of the strength of S3 7, and the limits, of error, 
(P = 0*99), are 70-142 per cent. 

IV. Approximate Methods in which only One or a Few Animals 
are Used on each Dose . 

Before the methods in the preceding sections can be applied with 
..effect, we need to have some preliminary information about the 
relation between dose and response, so that the doses and the dose 
intervals may be chosen to the best advantage. Here approximate 
methods in which one or only a few animals are used on each dose 
can be very tiseful. 



Details of Calculation of an Assay of the Relative Potency of two Antipneumococcus-Sera . 

£37. 17-hour culture. £32. 17-hour culture. 


Ibwin —Statistical Method 


[No, 



a* 

S 

si 

30061696 

111-260417 

125-456793 

122-171985 

rH 

© 

00 

© 

to 

© 

CO 


N 

I 

Cq CO T* © 
05 CO © © 
t* M H 
rH t> © © 
r-l rH 

tH 

to 

05 

© 

cq 


1 

8-96 

24- 11 

25- 41 
21-53 

80-01 

§ 

I 

Weighting 

Co¬ 

efficient. 

ID. 

HO0« w 
(N lO 00 

Cq © CO CO 

cq co co © 
6666 

■8 
r-( 

Probit or 
Normal 
e.d. 

y- 

H 1> M W 
g ©^H t>'# 
lOHcor- 
1 CO CO 05 CO 

co th ic 



to 

© © © F- © 

co r- 


Mice Pro¬ 
tected 
out of 40. 

O cq ^ 05 o 

1 —1 r—1 CO 


° § I jJ 

rH cq CO © 


i 

K 

cq t> © © 

05 r-t oo <M 

CO 05 CO 

rH o r-t O 
CO CO t> cq 

9 cq rH 03 

© rH cq cq 

CO rH cq © 

rH rH i—f 

! 366-414718 


i 

r 

cq co cq iq 

9 co rH cq 
t> cq co to 

f—1 I> GO 00 

261-62 


& 

R 

CO rH CO to 

05 rH to O 
do ^ H t> 
cq cq H 

71-65 

1 

Weighting 

Co¬ 

efficient. 

ID. 

r-t 00 CO CO 

cq oo co 
cq © co cq 
cq © to 

6666 


Probit or 
Normal 
e.d. 

y- 

■—( t- to 

8 to hH t* © 

, to H l> CO 

I CO CO co © 

CO to CO 



© to to to to 
co r , » oo 


« o © 

S o 

© cq hH © 

rH CO CO 


So* 

»«+H 

i— < cq co -sft to 



i 

6-699008 

46-428054 

101-477253 

125-604912 

121-093830 

I 

I 


1 

© 00 1> © © 
tO©C5 1>l> 
cq l> to rH © 
cq © © © 

rH rH 

304-07 1 

! 1 


i 

2-56 

13*64 

21-99 

1 25-44 
21-34 

84-87 

1 

s 

© H* 00 05 tO 

T* 00 05 © 00 
© CO CO CO 

© co © © to 
©O©©© 

I 

>> 

OO'rH I> CO © 

© lO TH l> 
rH tO H CO t-* 

CO CO © C5 © 

<N CO tO 





N 

3-0456 

3- 7073 

4- 3691 

5- 0308 
5-6926 


H 

rH cq CO to 


! 

4-014876 

44-186667 

108-999214 

135-563805 

82-396860 

1 

375-161422 


i 

R 

© -rH © © to 
© co oo to cq 

rH © © tO CO 
cq l> C5 © 

262-67 


C 

© i> cq © © 

© rH © 00 © 
rH CO CO CO CO 
rH cq cq rH 

75-99 

i 

i 

8 

(0-0416) 

0-3292 

0-5905 

0-5973 

0-3413 

1 

■a 

ct 

£ 

© rH t- IQ TH 

00 © -cH © 
rH © rH t- CO 

CO ©© © 

c^cb © © 





N 

CO © © l> 

assss 

oo © © -<# cq 
cq co -cH © © 



h cq co © 



£37. 17-hour culture, 


1937] 


Applied\ to Biological Assays. 


25 


H — 

3 3 
ss s 


M ^ 

jp kj 

3 3 3 3 
g s s s 


^ 3 CqH Sq \ 55 i^CO CC C<3 rfi CQ CQ 


<?<1 H 00 

oo os r- 
t- oo <n 

«OH 

-H r-i © 50 ^ 
lO © I> 05 00 CO 

© © eo do ^ r> 

C 5 00 00 C 5 

<n eo h 



W 

H? 


tH |> rH O CO 00 CO 

It- © © HNOOO 

h-OCO 05 CO (N CO 
Ol CO CD 10 IO O O 
h-txOOOHMOl^OOH 

©oo©coi>rHi>oo©oo 
^ ^ m 6 ^ 1010 o 10 6 

CO 


05 © < 


I 


o 1 05 CO CO CO 
CO L- 50 © © Ol 
© CO CO 05 © 


© Ol 
CO Ol GO 

© ^ © __ 

© rH © H IO I" O 

^©©©COHUbOlOllHOCO 
©©>+,—|©"r+‘*t©Cpap©*H 

oi © eo © ^ © oi oi © <6 r- © 

© r- »> oi 00 c- l— 01 l-* 

oi eo © eo 03 


rH © CO © CO <M © 

© h © eo 00 c— © 

CO 1> © © © ©3 ^ 

ih rH co © © co — 

Ol © © iH iH © © © 00 © 

© © © "^ pH © Ol rH Ol 00 

rHrHcbcOlOCOQOt^©© 

© 1> © <N © © © 

Ol © © CO 


3 3 
« s 


Hb>j ^ 

3 3 3 3 
es 


GqCqIN^ CQ C <3 5 Q rC 5 ^ 5 Q 




26 


Irwin —Statistical Method 


[No. 1, 


(a) Gaddum's Discussion. 

The dose to which 50 per cent, of a population of animals would 
respond is called the median effective dose. Following Gaddum, we 
will denote this by the symbol L.D.50. Gaddum has discussed three 
approximate methods of determining the L.D.50 of a preparation. 
These are: 

(1) The method* in which a series of doses, each bearing a 
constant ratio to the preceding one, is given, each dose to one 
animal. The L.D.50 is estimated as the geometric mean of the 
smallest effective dose and the largest ineffective dose. 

(2) Behrens' Method . Here about 6 animals are used for 
each of 7 or more doses and a quotient is calculated for each dose 
in which the numerator is equal to the total number of animals 
which have reacted positively to that and small doses, and the 
denominator is obtained by adding to the numerator the total 
number of animals which have reacted negatively to that and 
larger doses. If one of these quotients is equal to 0*5 the 
corresponding dose is taken as the L.D.50. If not, the L.D.50 
is obtained by interpolation. 

( 3 ) Karber’s Method . Here the L.D.50 is estimated by the 
formula, 

m = X 0 - S{ ^+ y ' 2 ^} 

where m = logarithm of the L.D.50, X 0 = the logarithm of a 
dose to which all animals react, and the second term is obtained 
by multiplying the mean of each successive pair of observed 
mortalities (or in general proportional responses) by the corre¬ 
sponding dose interval. The terms p f v ?/ 2 , each occur twice in 
the summation. 


Gaddum has determined by empirical methods the standard error 
of these estimates (ref. 2, pp. 25 - 8 ). The Teader must be referred to 
his paper for a discussion of the methods he used; but in all three 
cases he finds. 


Exd 

71 


( 48 ) 


where d is the difference between the logarithms to the base 10 of two 
successive doses, n the number of animals receiving each dose,, X 
the standard deviation of the logarithms of the individual effective 
doses, m is the logarithm of the L.D.50. For dose intervals between 
i*oX -and 2*oX, which is the important range in practice, he finds in 
case (1), E ~ 0-57, in case (2), K = o*66, in case (3), K = 0*564. 
He also points out, that if the assay had been carried out with 




1937] 


Applied to Biological Assays. 


27 


reference to a population in which the slope of the characteristic 
curve was exactly known, we should have had 

, _ 1 1 X* 

° m b 2 ‘ nS(w) nS(w) 

= ~, where K = l/[s(w) j . . ( 49 ) 

The quantity 1 /(jS(w)(|)| can be calculated from tables of tbe 

probability integral or similar tables; it is found very nearly constant 
(0*554) for different values of (d/1). Tins result might suggest that 
the approximate methods are almost as accurate as the more refined 
one (cf. 0*554 and o*6 or 07), but Gaddum is careful to point out that 
this is not so. They are nearly as accurate for a given dose interval 
and a given number of animals on each dose, when the whole range 
between zero and 100 per cent, response must be covered. But a 
much smaller error than this could be obtained, for a given number 
of animals, if a standard curve is available, by narrowing the interval 
between the doses and concentrating them closer to the LJD.50, 
where the observations will have a much greater weight. 

For example, if there are r doses and N animals are available, 
( 48 ) gives 

2 __ 

Cm ~ N ’ 

r = 8, n = 6, K — 055 , N = 48 , d = X. 

But by concentrating the 8 doses in the region between 25 per cent, 
and 75 per cent, response for which we may take w = const. = o*6 
approximately, we have, using a standard curve : 

Vm = 0*03X a 

or the variance is only about a third of tbe previous value. 

(b) The question of the consistency or bias of the estimates used . 

While Gaddum has examined the standard errors of these esti¬ 
mates, I do not think their consistency has ever been examined 
previously. A method of estimation from a sample is said to be 
consistent when it would give the right answer if applied to the whole 
population from which the sample is drawn. I have attempted to 
examine this question on the hypothesis that the logarithms of the 
individual effective doses, in a population of animals, are normally 


With 
we have 



Irwin —Statistical Method 


28 


| No. 1, 


distributed. In this section, I shall use the term dose to mean 
log-dose, or the dose measured on a logarithmic scale. 

In method (1), the number of animals cannot be increased without 
at the same time increasing the number of doses; as this is done, the 
deviations of the largest ineffective and the smallest effective doses 
from the L.D.50 tend to equality in magnitude, though opposite in 
sign, and both tend to increase indefinitely. In this sense, the metb od 
is consistent, but it is possible to examine the question of bias when 
we are limited to a fixed dose interval. A method of estimation is 
said to be unbiased if the expected value of the estimate in repeated 
samples is equal to the quantity estimated. This will clearly not bo 
the case unless the doses chosen are symmetrically situated with 
regard to the true L.D.50. 

In methods (2) and (3) we can calculate the estimate when an 
indefinitely large number of animals are supposed used on each dose; 
there will clearly be an inconsistency if the d oses are not symmetrically 
situated with regard to the true L.D.50. In fact, there must be an 
error which is periodic in nature as the position of the doses with 
regard to the true L.D.50 is varied, an error analogous to the periodic 
error in moments due to grouping, which Fisher has shown may 
become important when the interval of grouping exceeds the standard 
deviation (ref. 12). 

In all three cases, the error in question will vanish when the dose 
nearest the L.D.50 coincides with it, or differs from it by half a dose 
interval. 

Accordingly, one would expect the error to be near its maximum 
when the dose nearest the L.D.50 differs from it by a quarter of a 
dose interval. 

For method (1), a dose interval of 2X was taken, and doses equal 
to 4*5, — 2-5, — 0*5, + 1*5, + 3*5, + 5*5 times the standard 
deviation from the true L.D.50. A table of the normal curve then 
gives the following results :— 

Table VI. 


(Dose— 
L.D.50 )/A 

Chance of 

Chance of 

Chance Given 

Positive 

Response. 

Negative 

Response. 

Negative Respouse 
to Lower Doses. 

Dose is Lowest 
Effective. 

- 4*5 

0-0000034 

0-9999966 

1-0 

0-0000034 

-2-5 

0-0062097 

0-9937903 

0-9999966 

0-0062097 

- 0-5 

0-3085375 

0-6914625 

0-9937869 

0-3066205 

1-5 

0-9331928 

0-0668072 

0-6871664 

0-6412587 

3-5 

0-9997674 

0 0002326 

0-0459077 

0-0458970 

5-5 

! 

1-0 

0 

0-0000107 

0-0000107 

1-0000000 



1937] 


Applied to Biological Assays . 


29 


Hence the expected value ol the smallest effective dose is 
m + 0*95374^- Similarly, the expected value of the largest 
ineffective dose is m — o*94478x. Hence the estimated L.D.50 is 
wi 4* o*oo448X. * 

These calculations were repeated for the same interval when the 
nearest doses below the true L.D.50 were at — 0*25 X and — 075X, 
respectively and both gave an estimated L.D.50 of m + 0*0031 7 X. 

Thus we have :— 

Table VII. 


Nearest Dose bokro L D 30, 

Eat. L.T>.50—True L D.50. 

4 -dose interval from true L.D.50 

0 00317A 

I t» }> »» 

0-00448A 

§ j> ?» j» 

0-00317A 


Then the conjecture that the bias is greatest for an asymmetry of a 
quarter of a class interval is supported, but the bias is quite negligible, 
since the standard-error according to Gaddum’s formula is 

a/0*57 X 2 X X 2 = 1*07X. 

As a matter of curiosity these calculations were repeated for an 
interval equal to X, and the deviations from the true L.D.50 of the 
expected values of the smallest effective dose and the highest in¬ 
effective dose differed by only one unit in the fifth decimal place. 

By a somewhat similar method, the consistency of Behrens 5 
method may be examined. 

Taking a dose interval equal to 2 X and doses m — 4 * 5 X, 
m — 2*5X ... as before, we find *— 

Table VIII. 


(Dose—L.T).50)/A. 

Belireu’s Ratio. 

- 4*5 

0*0000102 

- 2*5 

0*0035333 

~ 0-5 

0-2932(578 

1-5 

0*9490185 

3-5 

0*9998905 

5*5 

1-0 


Interpolating linearly between — 0*5 and 1*5 (the interval is too wide 
to attempt anything more accurate), we find :— 

Estimated L.D.50 — True L.D.50 = o*i305X. 

Now, the standard error is xV {(2 X 0 * 66 )///} = 0*47X when v = 6. 
So that the error due to inconsistency of estimation is about a third 
of the standard error, which is not large for such a wide dose 
interval. For a dose interval X, with an asymmetry of a 



Irwin —Statistical Method 


30 


[No/ 


quarter of a dose interval, tlie error was found to be O’ 
or quite negligible, 

Karber’s formula becomes clearer if we imagine tlie stai 
curve represented in the form of a normal frequency- curve, an 
ordinates corresponding to the observed doses X v X x + d, X 1 
. . . drawn. Then the frequencies to the left of these ordin 
will be p v p 2 , p z . . . the true proportional responses correspondi 
to these doses. The areas of the successive frequency groups w 
therefore be p 2 — p v p z — p 2 , P&Ps • • * a &d. these may b< 
estimated by p\ — p ' l9 p' z — p' 2 ■ - • p'r — p'r-v the observed 
proportional frequencies. We shall have p\ = 0, p f r = 1 , if i 
sufficient number of doses be taken. If the frequency groups b< 
supposed concentrated at the mid-points of the intervals X x + Id 
X 1 + \d, X t + %d . . . the L.D.50 may be estimated by the meat 
value of X, or by 

(p'r-p’f~x)(X f -W 

f iv'r-l - J>VJ(*r - W + ... (p' # - P\)(Xr - (r - j)d) 
With p' r = 1, p\ = 0 this is equal to 

X r — d(ip'r + p'r -1 + p'r- 2 + • • • + 

which is Karber’s formula. Thus Karber’s formula is seen to be 
identical with the ordinary formula for the mean of a grouped 
frequency distribution. 

The problem of examining any inconsistency of estimation in 
Karber’s method thus amounts to finding the error due to asymmetry 
in the mean of a grouped normal frequency distribution. 

Its maximum value has been shown by Fisher to be equal to 

a 3A*jx* 

-e“ * 

TC 

where a is the class interval. 

When a = X, this amounts to the minute quantity (9 X icr 13 )X. 
When a = 2X it amounts to o*oo458x, compared with a standard 
error when n = 6 of o*434X, again quite negligible. These results 
agree with a direct numerical calculation, with an asymmetry of a 
quarter of a class interval. 

Thus in no case is the error investigated in this section of any 
appreciable importance, but it seemed to me that the point required 
investigation. 

* The exact formula is, 

2 c(e 2b * sin 9 — \e 2e * sin 29 ~f \e 2 «* sin 30 + . . .) 

with c = (a/27r), 9 — (2ir^ja) where f is the distance from the mean, of the 
nearest class boundary below it. All terms but the first are negligible m com¬ 
parison with it, and the first term gives the maximum error when f — £a. 



Applied to Biological Assays. * 31 

V. Special Methods. 

lines the methods described in the previous sections are 
y or inapplicable. Two examples in which this is the 
-gangrene antitoxin (Vibrion Septique) and Vitamin C —will 
i Part II, dealing with applications. 


PART II. APPLICATIONS. 

I. Introductory . 

t hope to deal here with the whole field of application of 
methods to biological assay. I shall limit myself to some 
bstances covered by the recent work of the Sub-Committee 
^curacy of Biological Assays of the Pharmacopoeia Cora¬ 
ls These were eight in number, Vitamins A, B v C and D; 

Staplylococcus antitoxin; Gas-Gangrene antitoxins (Vibrion Septique 
and Oedematiens) and Antipneumococcus Serum (Type I). I shall only 
deal here with Vibrion Septique , Vitamins C and D, and Antipneu¬ 
mococcus Serum. Much as I should like to have referred to the work 
done on insulin, space and time do not permit. 

I will quote a few paragraphs from the report of the Sub-Com¬ 
mittee to give as clear an idea as possible of the scope of their work. 

“ The simplest and most satisfactory method of estimating the error 
of a test is to obtain independent estimates from different laboratories of 
the potency of a given sample or series of samples. The variance is usually 
mainly due to the sampling error of the test animals, but it is also partly 
due to technical errors, errors of estimation, errors of grouping and other 
unknown errors. The disadvantage of this direct method of estimating the 
error is that the test must be repeated a large number of times in order to 
make the estimate sufficiently accurate. This repetition involves much 
labour. 

“ Data obtained in this direct way by repeating the test are available 
for most of the tests of antitoxins and have been used in the calculations 
discussed below. It will be seen that owing to the limited extent of the data 
the estimates of error are themselves sub jcct to a rather large error. Never¬ 
theless, these estimates are probably accurate enough for practical purposes. 

“ The evidence for the error of most of the tests for vitamins is more 
extensive, but less direct. It depends on a knowledge of the sampling 
variance of the biological effect in animals, and on the shape of the curve 
connecting dose and effect. Prom such data it is not difficult to calculate 
the contribution of the sampling variance of the effect to the error of the 
test, in the case where the mean effect of the preparation under test is equal 
to the mean effect of a Standard Preparation. This has been done and nas 
been given as the total error of the tests. The errors calculated in this 
way are usually so large that it is clearly justifiable to neglect other sources 
of error in comparison. In practice the mean effects of the preparations 
will not be exactly equal, and it will generally be necessary to adopt some 
device to allow for this inequality. For example, some assumption can 
be made as to the shape of the dose-effect curve, or an estimate can be made 
of the shape of the curve in each experiment. Errors due to this source 
might, therefore, depend on the variability of the shape of the curve or on 
the method used to estimate this shape. There is not much evidence about 



32 


Irwin— Statistical Method 


the variability of tho shape of the curve, and the Pharmacopoo 
specify methods for estimating the shape in each experiment, 
to this source have, therefore, been neglected in relation to t 
assays. 

“ The data available for tests of antipneumocoecus serum ax 
extensive and a more complete calculation of the error has beei 

“ The Sub-Committee have devoted much thought to the 
the best method of expressing the results of their calcula 
Pharmacopoeia has hitherto been content with a certain lack of pr 
attitude towards errors of assay. Chemical methods of assay a 
accurate that it has never been necessary to measure their erro 
errors are only vaguely known. The fact that the errors < 
methods are usually large has made it necessary to measure the 
errors can be stated with scientific precision. This was done i 
in the British Pharmacopoeia, 1932, but no uniform method 
the results was adopted. The Sub-Committee are of the opin 
desirable to adopt a uniform notation for these errors, and recc 
this should be done when the next British Pharmacopoeia is p 

41 The usual method of expressing the error of any test is 
standard error. If this notation were adopted in the Pharmacc 
is a danger that persons unacquainted with statistics might sit 
the tests were more accurate than they are in fact. The Sub-L ..w.v6 
are of the opinion that the figure given in the Pharmacopoeia should be 
some multiple of the standard error and should represent an error which is 
scarcely ever exceeded in practice.” 

Then follows an explanation of the term “limits of error” to 
which reference has already been made in the previous sections. 


II. Gas-Gangrene Antitoxin (Vibrion tieptique). 

The Pharmacopoeia suggested methods for the assay of Vibrion 
Septique are given in Report No. 9 of the Pharmacopoeia Commission’s 
Committee (pp. 11 - 15 ). 

There are two methods. One uses intravenous injection into 
mice and the other intracutaneous injection into guinea-pigs. The 
principle of either method is first to find a “ test dose ” of toxin which 
will just neutralize a given amount of the standard preparation of 
antitoxin; then to find the amount of the test preparation of anti¬ 
toxin which will just neutralize the test dose. 

(a) Intravenous injection into mice. 

The test dose is first determined. It will bo seen from the 
detailed instructions that mixtures containing 1 unit of the dilution 
of the standard preparation and a varying quantity of the solution 
of the toxin are prepared. Each mixture is injected into the tail 
vein of each of six mice. With the mixtures usually employed it is 
found that when one mixture kills some but not all of the mice, tho 
mixture next highest, in order of amount of toxin contained, kills all 
the mice and the mixture next lowest kills none. Consider for 
example the following data (Madsen, Copenhagen, 1934 ). 



1937 ] 


Applied to Biobgical Assays . 
Table L 


33 


To\m Dose (V.S.10) 
Orgnn.) 

Proportion of Vice Surviving. 

Provisional American Solution * 
0 2 mi.* = 1 P. unit. 

Proposed International Solution 

0 2 ml. =s 1 P. unit. 

4*7 

0/6 

0/6 

4-4 

3/6 

4/6 

4-1 

6/6 

6/6 

3*8 

6/6 

6/6 

* Millilitres. 


The Provisional American solution and the Proposed International 
solution were supposed to have the same potency, and the experiment 
provided confirmation of that fact. Our purpose here, however, is 
to draw attention to the extreme steepness of the dosage-response 
relation; the test dose of toxin is almost certainly between 4*7 and 4*1 
milligrams, and no very precise study of the dosage-response relation 
is necessary to determine the test-dose of toxin sufficiently accurately. 

The test dose of toxin having been determined, mixtures con¬ 
taining this test dose and varying amounts of the anti-toxin under 
test are prepared. Again it is found that the dosage-response relation 
is so steep that no detailed study of it is necessary to find with 
sufficient accuracy the amount of antitoxin under test that will 
neutralize it. Madsen's data will again illustrate this point. Using 
test doses of 4*4 and 4*3 milligrams of toxin V.S.10, he prepared 
mixtures containing these test doses and 0*9, 1*0, ri P. units of 
antitoxin. These results are as follows :— 


Table II. 


Toxin Dose. 
(VS. 10) 
ilgms. 

Provisional American Solution. 

Proposed International Solution. 

Anlito\m 

P. units. 

Proportion of 
Mice Surv lving. 

Antito\m 

P. Units. | 

Proportion of 
Mice Surviving. 


09 

0/6 

09 

0/6 

4.4 

1*0 

3/6 

1*0 

0/6 


1*1 

5/6 

1*1 

6/6 


0*9 

0/6 

09 

0/6 

4*3 

1*0 

2/6 

1*0 

1/6 


M 

6/6 

I-I 

6/6 


Again it will be seen that the dosage-response relation is extremely 
steep. The amount of antitoxin that will neutralize a test dose of 
4-4 milligrams is almost certainly between 0*9 and ri units. 

We give an example (Petrie’s data) of an assay performed by this 
method on an antitoxin of unstated potency. 

SUPP. VOL. IV. NO. 1. C 





















1937] 


Applied to Biological Assays . 


35 


) “ The limits of error of this test, being based oil only eleven determina - 

tions, are not very precisely determined. The value (of the standard error) 
found was 4*3 per cent. Actually the true standard error is unlikely to be 
outside the limits 3*0 and 7-6 per cent.; on the average only once in twenty 
sets of eleven determinations would the true standard error lie outside 
limits calculated from the observed error in the same way as these. Even 
if the true standard error %ere as high as 7*6 per cent., the upper limit of 
error of 111 per cent, given above would be exceeded in only 7 per cent, of 
tests. It is therefore unlikely that this limit will be exceeded.” 

(b) Intracuianeous injection into guinea-pigs . 

Tbe principle of this test is tbe same as that of tbe mouse test. 
The test dose is first determined. Mixtures containing 0*5 unit of 
the standard preparation and varying quantities of the toxin are 
prepared. The mixtures are injected into the flanks of a guinea-pig 
at suitably spaced intervals. As a rule, two guinea-pigs are used, 
and each guinea-pig gets all the mixtures. The test dose of the toxin 
is the amount in that mixture which causes at the site of the injection 
9 small, characteristic, necrotic lesion in the skin of the guinea-pig. 
Mixtures containing larger amounts of toxin cause a greater amount 
of necrosis with oedema, and a mixture of smaller amounts of toxin 
cause no reaction. 

The test dose of toxin having been obtained, mixtures are pre¬ 
pared containing the test dose of toxin and varying amounts of the 
antitoxin under test. All the mixtures are injected into each of two 
guinea-pigs. 

Again, the dosage-response relation is so steep that there is no 
difficulty in determining with sufficient accuracy the amount of 
antitoxin under test that neutralizes the test dose of toxin. There will 
only be one mixture out of those chosen which produces the char¬ 
acteristic reaction, and this must contain half a unit of standard 
antitoxin. A control experiment with the mixture containing the 
test dose of toxin and 0*5 unit of standard antitoxin is always carried 
out at the same time. We may illustrate this by the following data 
from Petrie:— 

Table IY. 



Mixture Injected in 0*2 ml. 


Antitoxin Solution. 

Toxin (V.S.11) 
Mgrm. 

Antitoxin. 

Result : 48 hours. 

Proposed International 

2*2 

0-5 P. unit 

Small reaction * 

Antitoxin of unstated 
potency 

2-2 

0*0028 ml. 
0*0029 „ 
0*0031 „ 
0*0033 „ 
0*0036 „ 

Large reaction 

it )> 

Small reaction * 

>» » 

* ' 

Theae reactions were equal. 






30 Irwin —Statistical Method [No. 1, 

Since 0*0033 of tlie antitoxin of unstated potency contains 0*5 
unit tlie antitoxin is found to contain 152 units, per ml. 

This test, like the previous one, is clearly an accurate test; a 
factor contributing to the accuracy is the elimination of animal 
variations by using the same guinea-pig for the mixtures compared. 

The only way of estimating the error is to examine a series of 
independent determinations; unfortunately there are only five 
available (156, 160, 157, 152 and 160 units per ml.). These have a 
mean of 157 and a standard deviation of 3*3 or 2*1 per cent. The 
limits of error (P = 0-99) are, therefore, on this basis, 94-105 per cent. 
The 2J per cent, fiducial limits for the standard-error are 1*3 and 
6*i per cent.; accordingly in the Committee’s report the following 
paragraph was inserted :— 

“ Actually the true standard error is unlikely to lie outside the limits 
1*3 and 6-i per cent.; on the average only once in twenty sets of five deter¬ 
minations would the true standard error lie outside limits calculated from 
the observed error in the same way as these. These limits are, however, 
too wide for any stress to be laid on the actual figures obtained.” 

It was recommended that the following statement should be inserted 
in the Addendum to the British Pharmacopoeia, 1932 :— 

“ Limits of Error:—The data at present available do not permit of a 
sufficiently accurate determination of the limits of error, but they are 
certainly not wider than the limits for the test by intravenous injection into 
mice.” 

In spite of the limited data available, there is no doubt that this 
test is, for a biological assay, an exceptionally accurate one. 

III. The Biological Assay of Antiscorbutic Vitamin 
{Vitamin C). 

There are two methods of assaying Vitamin C. The details of 
the Pharmacopoeia’s suggested methods are given in Report No. 9 of 
the Pharmacopoeia Commission’s Committees (pp. 63 - 5 ). 

The first method depends on an examination of the histological 
structure of the teeth, the second method is a growth test. Both 
these tests provide examples of cases where special methods have io 
be used. 

(a) Changes in the Histological Stiucture of the Teeth. 

The test is a curative test. When guinea-pigs are fed on diets 
deficient in Vitamin C, changes are produced in the structure of their 
teeth. The guinea-pigs receive a basal diet free from Vitamin C for 
fourteen days. Then two groups of ten guinea-pigs are taken. 
Those in one group receive daily doses of the preparation being tested, 
those in the other group receive daily doses of the Standard Prepara¬ 
tion, for fourteen days. A useful dose of the Standard Preparation 



1937 ] 


Applied to Biological Assays . 


37 


is 1 milligram. The preparation of ascorbic acid being tested is 
given in the same daily dose 1 milligram. 

At the end of the experiment the guinea-pigs are killed, and the 
lower jaw-bones are removed and decalcified. Sections axe cut of 
the root of the incisor at the region of the bend of the jaw-bone. 
These sections are stained. The extent of the disorganization of the 
structure is estimated by comparing the appearances with those 
shown in a graded series of sections derived from guinea-pigs which 
have received different doses of the Standard Preparation with the 
same basal diet. The sections are numbered from o to 4 in order 
of degree of protection from scurvy. The mean degree of protection 
from scurvy is calculated for each group. If the responses of the 
two groups are equal, the activity of the test preparation is equal to 
that of the standard preparation; if not, the experiment is repeated 
with a dose of the test preparation which it is judged will give a 
response equal to that of the standard. 

The data used for calculating the error of this test were taken from 
the published results of Key and Elphick (ref. 14 ) and of Key and 
Morgan (ref. 15 ). The variance of the response in each group of 
animals simultaneously receiving any one dose was calculated, and 
these variances were plotted against the mean response. It was found 
that the variance was low when the mean response was less than 1 
or greater than 3-5. This was due to the fact that the animals at 
these levels must have given a response either near zero or maximal. 
An animal cannot do less than show no improvement or more than 
be entirely cured. Otherwise there was no obvious correlation 
between variance and mean effect. The variance was therefore 
calculated from the data corresponding to a mean effect between 1 and 
3*5. This gave observations on 234 animals in 44 groups. The 
variance was thus estimated as 0-888. 

The mean degree of protection from scurvy among 62 animals 
receiving no Vitamin C was 0*794 with a standard error of 0 * 0786 . 
The variance of the difference between this observation and the 
mean of observations on 10 animals would be ( 0 - 0786 ) 2 + 0 * 888/10 
or 0 * 095 . The result of an experiment can therefore be taken as 
evidence (P = 0 * 99 ) of the presence of Vitamin C only if the mean 
response is greater than 

( 0 * 794 ) + ( 2*576 x Vo - ^) or 1 * 59 . 

The standard error of the mean response of a group of ten guinea- 
pigs treated with either preparation is V{(0*888) /10} — 0 * 298 , and of 
the difference between two such means 0 * 421 . 

Now, in this test the relation between response and dosage 
measured on an arithmetic scale was found to be linear. It follows 



38 


Irwin —Statistical Method 


[No. 1, 


from this that the percentage limits of error will be different at 
different levels of dosage. This is where this test differs from the 
usual case in which the relation between response and dosage measured 
on a logarithmic scale is linear. The dosage-response relation is of 
the form, 

y — y o = &*?.(!) 

which gives By = bBx 


or 


G jr __ g y 

a y — y o 


( 2 ) 


This gives the standard deviation of the estimate of dosage when the 
corresponding mean effect is y, in terms of <s y the standard deviation 
of response at this level. Now, when a dose x of each preparation is 
given, let the mean effects be y x and y 2 respectively, and the values 
of x corresponding to these two values of y calculated from ( 1 ) be 
x x and x 2 . Then the potency ratio is estimated as u = (xjx x ) 

and K 2 /« 2 ) = K 2 + <Jx, 2 )/x 2 = (av , 2 + a v 2 )l(y - ?/ 0 ) 2 

or (o B /u) = {a yi 2 + c y -}il(y - y 0 ) .(3) 

Now a Vx — dy t = 0*298, and y 0 may be taken as 0*8 
If y = 2*5 we find ct m /w = (0*421)/(1*7) = 0*248. 

Hence the limits of error (P = 0*99) are, 


100 ± 257*6 or 36 and 164 per cent. 

If y = 3 , we find the limits of error are 51 and 149 per cent. The 
Committee therefore recommended the following statement for use 
in the addendum to the Pharmacopoeia, 1932 :— 

44 Limits of Error In an experiment in which the average effect 
(degree of protection from scurvy) is estimated for 10 guinea-pigs the 
following statements can he made :— 

(1) There is no conclusive evidence of the presence of Vitamin C unless 
the effect is greater than i*6. 

(2) Two preparations can be shown to differ significantly in their activity 
only when their effects differ by more than one unit. 

(3) When the effect of each preparation is 2*5 the limits of error (P = 
0*99) are 36 and 0-64 per cent. 

When the effect of each preparation is 3-0 the limits of error (P — 0-99) 
are 51 and 149 per cent.” 


(b) Growth and development of macroscopic lesions of scurvy . 

This is a test in which the intention is to observe two responses. 
Por the purpose of calculating the error of the assay it has been 
treated as a growth test only. Guinea-pigs are placed on a diet 
free from Vitamin C. On this diet only it is found that guinea- 
pigs of the weight stipulated, derived from a good stock which 
has received cabbage regularly, develop scurvy and die in four to 
five weeks. This diet is supplemented both in the group on the 





1937] 


Applied to Biological Assays. 


39 


Standard Preparation and in the group on the preparation tested, by 
doses of Vitamin C. The object of the test is to find a dose of the 
preparation tested which gives the same response as the dose of the 
Standard Preparation. Doses are given which result in a subnormal 
growth. 

In every group the daily dose is continued from the start of the 
experiment for six weeks, the animals being weighed twice a week 
throughout. The average growth, during the whole period, of each 
group of animals is calculated. 

The Pharmacopoeia suggests that five guinea-pigs should be used 
in each group; the error has, however, been worked out for ten in 
each group, so as to make possible a comparison with the previous test. 

The calculation of the error is based on experiments with 66 
animals in the laboratories of the Pharmaceutical Society of Great 
Britain. 

The standard deviation of the increase in weight among animals 
receiving the same dose was estimated as 21*38 grammes. The 
standard error of the difference between the means of two groups of 
10 animals would therefore be (21*38 X V2)/Vl0 = 9*56. The 
deviation (P == 0*99) is 9*56 X 2*576 = 24*6. 

This test differs from the usual case in that the dosage-response 
curve was found to be of unusual shape, the data being approximately 
fitted by the equation y = 74*3 -f- 108*2 log (log 10 X) X being 
measured in milligrams. The percentage error will therefore be 
small when the effect is small and large when the effect is large. 

The error was calculated for the case when the doses of both 
the Standard Preparation and the preparation being tested are just 
sufficient to maintain weight. It was found from the standard 
curve that an increase of 24*6 grammes in weight is produced by a 
dose equal to 138'5 per cent, of the dose which just maintains weight, 
and a decrease of 24-6 grammes by 82*4 per cent, of this dose. 

Accordingly the Committee recommended the following paragraph 
for insertion in the Addendum to the British Pharmacopoeia, 1932 :— 

“ Limits of Error:—In an experiment in which 10 guinea-pigs receive 
the Standard Preparation and 10 guinea-pigs receive the preparation 
being tested, in a six-weeks’ test, and in which the dosage of each is just 
sufficient to maintain the mean weight constant, the limits of error 
(P =s 0*99) are 82 and 139 per cent. If the mean response is larger the 
error is also larger.” 

IV. The Biological Assay of Anti-rachitic Vitamin D. 

The Pharmacopoeia’s suggested methods for the Assay of Vitamin 
D are given in Appendix XV of the British Pharmacopoeia 1932. 
Certain modifications are suggested in Report No. 9 of the 
Pharmacopoeia Commission’s Committees (p. 65). 



40 


Irwin —Statistical Method 


[No. 1 


There are three methods, all using rats—the X-ray test, the test 
by examining the bones after staining, known as the line test, and 
the ash content of bone test. 

Both in the X-ray test and the line test the rats are fed on a 
rachitogenic diet for a preliminary period of about three weeks. 
They are then divided into two groups, the rats of each litter being 
evenly divided between the groups. The rats in one group receive 
daily doses of the standard preparation, and the rats in the other 
group have daily doses of the preparation being tested throughout 
the u test period / 5 which is from ten to fourteen days. At the 
end of the test period the rats are killed. 

If the group on the test preparation which gives most nearly the 
same response as the group on the standard preparation be compared 
with it, it is possible to assay one preparation in terms of the other 
and to calculate the error of the assay, provided that the average 
effect in groups of rats of a series of different doses of the standard 
preparation has previously been determined. 

The procedure in the ash content of bone tests is similar, except 
that this is a prophylactic test, while the former two are curative. 
In the ash content of bone test, the rats receive their daily dose of 
Vitamin D for about four or five weeks from the beginning of the 
experiment. 

(a) The X-ray test. 

By means of X-ray photographs of the right knees of rats, 
Bourdillon and his colleagues (ref. 16) constructed a scale of degree of 
healing such that doubling the dose produced a change of 2 units 
in the scale. The scale rises from o, where there is no calcification 
by units, to 12 , when the bone is normal and early healing is shown 
by absence of perceptible swelling. The above relation holds from 
Numbers 4 to 10 . They made a very elaborate investigation of all 
the errors of this test; for our present purpose it is sufficient to 
note that, 

( 1 ) If response is plotted against the logarithm of the dose 
to the base io, the slope b = 2 /log 2 = 6*6439. 

( 2 ) The standard deviation of the difference in response 
between pairs of litter mates was found, from 1500 pairs of 
litter mates receiving doses in the ratio 2 to x, to be 1 * 635 . 

The standard error of the mean difference in response of two 
groups of 10 rats is therefore (1*635)/VtO, and the standard error 
of the logarithm of the potency ratio is (l*635)/&VlO = 0*0778. 

Hence the limits of error (P = 0*99) are 63-159 per cent. 

(b) The line test. 

In the line test, when the rats have been killed at the end of the 
experiment the distal ends of the ulnae and radii or of the tibiae are 



1937 ] Applied to Biological Assays. 41 

removed and stained. The degree of healing is estimated from a 
scale devised by Dyer (ref. 17 ) which runs by units from o to 6. The 
relation between response and dosage measured on an arithmetic 
scale is not linear, but rises sharply at first and then flattens out. 
The curve also varies according as the initial degree of rickets is 
severe, moderate or slight. The initial degree of rickets can only 
be inferred from the state of the bones, at the end of the experiment, 
of those rats which received no Vitamin D. For this purpose it is 
called severe if the degree of healing of a rat which has received no 
Vitamin D is below i*o, moderate if the degree of healing lies between 
i*o and 3*0 and slight if it is greater than 3*0. 

For the purpose of calculating the error of the test for the Pharma¬ 
copoeia, the curve of response used was that obtained by Key and 
Morgan (ref. 18 ) for a severe initial degree of rickets, from seventeen 
litters of five rats each, litter mates receiving five different doses. 
This curve does not depart very sensibly from the logarithmic form. 
Doses of 0*25, 0*5, 1*0 and 2*0 units of Vitamin D give responses of 
3*25, 4*05, 4-70 and 5*20 respectively. 

The experiment is arranged so that as far as possible the average 
response of the groups compared is in the neighbourhood of 3*0. In 
this neighbourhood the curve is very closely logarithmic. 

The standard deviation of the response was calculated from the 
differences between 313 pairs of litter mates receiving the same dose, 
and was found to be 0-78. The standard deviation of the difference 
between two mean responses, each calculated from 10 rats is therefore 
( 0 * 78 Vl)/(Vl 0 ) or 0 - 349 . 

The abscissa of the response curve corresponding to the mean 
degree of healing 3*0 is 0-205. That corresponding to the mean 
+ 2*576 times the standard error (3*899) is 0*440, and that corre¬ 
sponding to the mean — 2*576 times the standard error (2*101) is 
o*xoo. Hence the limits of error (P = 0 * 99 ) are 49-215 per cent.* 

(c) The ash-co>itent of hone test. 

The response here is the percentage of ash in the dry extracted 
bone (either femora or humeri). 

The calculation of the error was based on data from five experi¬ 
ments provided by Coward and the published data of Hume, 
Pickersgill and Gaffikin (ref. 19 ). 

In each experiment the average percentage of ash in the bones 
was calculated for each dose given, and these were plotted against 
the logarithms of the dose to the base 10. Regression lines were 
fitted and the slopes of the regression lines were obtained. One of 
Coward’s experiments could not be used for this purpose, as only 
single doses of the preparations compared were given. 

* (49 X 215 ) = 10 , 535 , which is not materially different from 10 , 000 . 
Hence the logarithmic nature of the curve in this region is verified. 

c 2 



42 


Irwin— Statistical Method 


[No. 1 


In each experiment the variance of the difference in response 
between litter mates was obtained by taking the available pairs of 
litter mates in which the first and second members perspectively 
had received the same dose. The average variance for all the 
experiments was then obtained. This proved to be 15 - 69 , and the 
corresponding standard deviation 3*96 per cent. 

The following table shows the number of rats used in each 
experiment, the values of the slope, and the number of pairs of litter 
mates used for calculating the variance of the difference in response. 


Table V. 



Number of 
Rats Used. 

Slope. 

Numbers of 
Pairs for 
Variance. 

(1) . 

36 

14-3 

26 

(2). 

35 

13-7 

28 

Coward’s Experiments. (3) ... 

20 

— 

10 

(4). 

35 

28*3 

29 

(5). 

36 

27*5 

30 

Hume, Pickergill and Gaffikin. 

120 

15*6 

96 


282 


219 


As explained above, the error was calculated only for the case 
where the response to the Standard Preparation is equal to the response 
to the preparation being tested. The slope of the response curve 
varies from 14 to 28 per cent, per ten-fold increase in dose. The 
value of 14 has been used in calculating the limits of error, and this 
gives the largest error and will tend to compensate (cf. Part I, 
Section lid) for any additional error that would be introduced by 
the inequality of the responses to the two preparations. ' 

The standard error of the mean difference in response of two 
groups of ten rats is (Z-%0)/Vl0 = 1*252, and the standard error 
of the logarithm of the potency ratio (1*252)/14 = 0-0894. Hence it 
follows that the limits of error (P = 0-99) are 59-170 per cent. 
This is the figure that has been adopted for the Pharmacopoeia. 
The weighted mean of the above slopes is 21 * 1 , and with this value 
of the slope the limits of error (P = 0-99) are 70-142 per cent. 

V. The Error of Vitamin Tests in General . 

Limits of Error have been calculated for the Pharmacopoeia for 
Vitamins A, B 1? C and D, both for (P = 0*99) and (P = 0*95). 
The methods of assaying Vitamins 0 and D have been described in 
this paper, want of space forbids a similar treatment of Vitamins 
A and B x . For these the reader must be referred to Reports Nos. 9 
and 10 of the Committees of the Pharmacopoeia Co mm ission (ref. 20 ). 
The table on p. 43 gives the results. 








1937] 


Applied to Biological Assays . 


43 


Table VI. 

Limits of error (P = 0-99 and P = 0*95) * 
with varying numbers of animals. 


Vitamin A. 



(a) 3 Weeks’ Test. 

(5) 5 Weeks’ Test. 


(P = 0*99) 
per cent. 

(F =0*95) 
per cent. 

(P = 0-99) 
per cent. 

(P = 0*95) 
per cent. 

20 rats . 

40 rats . 

80 rats . 

30 and 339 
42 and 237 
54 and 184 

40 and 253 
52 and 193 
63 and 159 

37 and 272 
49 and 203 
61 and 165 

47 and 214 
58 and 171 
68 and 146 


Vitamin B 1# 



(a) Pigeons. 


(&) Bats. 


(P = 0*99) 
per cent. 

(P = 0-95) 
per cent. 


(P = 0*99) 
per cent. 

(P = 0*95) 
per cent. 

20 pigeons ... 
40 pigeons ... 
80 pigeons ... 

15 and 652 
27 and 377 
39 and 225 

24 and 417 
36 and 274 
49 and 204 

10 rats ... 
20 rats ... 
40 rats ...' 

65 and 154 
74 and 135 
81 and 124 1 

72 and 139 
79 and 126 
85 and 118 


Vitamin C. 



(a) Teeth. 

(6) Growth. 


(P =0*99) 
per cent. 

(P = 0*95) 
per cent. 

(P = 0-99) 
per cent. 

(P = 0-95) 
per cent. 

20 guinea pigs] 

40 guinea pigs >y = 3 ... 

80 guinea pigsJ 

51 and 149 
65 and 135 
76 and 124 

63 and 137 
74 and 126 
81 and 119 

82 and 139 
86 and 124 
90 and 115 

86 and 126 
89 and 117 
92 and 111 

20 guinea pigs] 

40 guinea pigs = 2*5 ... 
80 guinea pigs J 

36 and 164 
55 and 145 
68 and 132 

51 and 149 
66 and 134 
76 and 124 

At a level of dosage just 
sufficient to maintain 
weight for six weeks. 


Vitamin-D. 



(a) X-ray. 

(5) Ash Content of Bone. 

(P =0*99) 
per cent. 

(P = 0*95) 
per cent. 

(P as 0*99) 
per cent. 

(P =0-95) 
per cent. 

20 rats . 

63 and 159 

70 and 142 

59 and 170 

67 and 150 

40 rats . 

72 and 139 

78 and 128 

69 and 146 

75 and 133 

80 rats . 

79 and 126 

84 and 119 

77 and 130 

82 and 122 


(c) Line Te3t. 




(P = 0*99) 

(P = 0*95) 




per cent. 

per cent. 



20 rats . 

49 and 215 

59 and 176 



40 rats . 

61 and 168 

68 and 146 



80 rats . 

71 and 144 

78 and 129 




* P = 0-95 indicates that the result of the test will be within the given limits 
95 times out of every 100 times that the test is made. 







44 


Irwin —Statistical Method 


[No. 1, 


It is perhaps permissible for a statistician, when he sees such a 
large error as that given by the Pigeon Test for Vitamin B 1} to ask 
his pharmacological colleagues whether an improved technique 
giving a more uniform response or a steeper dosage-response relation 
cannot be devised. 

VI. The Biological Assay of Antipneumococcus Serum. 

The Pharmacopoeia’s suggested methods for the assay of anti¬ 
pneumococcus serum are given in Report No. 9 of the Pharmacopoeia 
Commission’s Committees (pp. 20-24). There are two methods. 
Both depend on comparing the amounts of a test and a standard 
preparation of antipneumococcus serum necessary to neutralize a 
given amount of a test culture of diplococcus pneumoniae. In the 
first method, mixtures of the serum being tested and the test dose 
of the culture are injected intraperitoneally into mice. In the second 
method the intravenous injection into mice of the serum being tested 
is followed by the intraperitoneal injection of the test dose of the 
culture. 

In either case doses of the serum being tested and of the Standard 
Preparation, so adjusted as to be well spread over the range of 
mortality between zero and ioo per cent., are given to groups of 
mice. In the first method, five doses of each preparation are used; 
in the second, two doses of the standard preparation and three of 
the test preparation. The assay of antipneumococcus serum forms 
a complete contrast with the assay of such a preparation as gas- 
gangrene antitoxin (Vibrion Septique). In the former case a sixteen¬ 
fold range of dosage is necessary to cover the range of mortality 
between zero and ioo per cent.; in the latter case, the whole range 
is covered by doses between two extremes which differ by perhaps 
15 or 20 per cent. 

The method of calculating the error of an assay of antipncumo- 
coccus serum has already been explained in detail (Part I, Section 
III g). We therefore go straight on to describe how the available 
data were used to calculate the limits of error for the Pharmacopoeia. 

(a) The Method of Intraperitoneal Injection into Mice of Mixtures of 
the Serum being tested and the test dose of the Culture. 

The calculations are based on data (in part unpublished) provided 
by Wilson Smith (ref. 21) andTrevan and Brown. In each of seven 
comparative tests by Wilson Smith of two sera with 200 animals 
receiving the serum being tested and 200 receiving the Standard 
Preparation, the percentage of mice protected by each dose was 
converted into a normal equivalent deviation or probit and 
plotted against the logarithm of the dose to the base 10 . 

Regression lines were calculated and the slopes estimated. The 



1937] 


Applied to Biological Assays . 


45 


standard error of the logarithm of the result was then calculated 
from equation (42). The value of a M thus obtained was multiplied 
by V 2 to obtain the value corresponding to ioo animals receiving 
the Standard Preparation and ioo animals receiving the serum 
being tested. The results were as follows:— 


Table VII. 




2*576 orjif. 

Limits of Error 
(per cent.). 

1 

0-0927 

0-2388 

58-173 

2 

0-0836 

0-2154 

61-164 

3 

0-0988 

0-2546 

56-180 

4 

0-0847 

0-2183 

60-165 

5 

0-0899 

0-2316 

59-170 

6 

0-1141 

0-2939 

51-197 

7 

0-1034 

0-2664 

54-185 


The average of these values is 57-176 per cent. 

In view of the availability of further data comprising seven 
single tests by Wilson Smith and the slopes, with their weights, 
from sixty-seven single experiments by Trevan and Brown, the 
situation was explored further. 

The average slope was obtained: ( 1 ) for Wilson Smith’s tests, 
( 2 ) for Trevan and Brown’s tests and (3) for all tests. If an average 
value of the slope is used for the estimation of the ratio, instead of 
the value actually obtained from the comparative test itself, a b 2 in 
equation (42) must be equated to the real variance in slope added 
to the variance of estimate of the average (the latter quantity being 
a small correction). The real variance in slope may be obtained 
by deducting from the variance of the slopes of all available experi¬ 
ments the average variance of estimate. 

The results were as follows :— 

Table VIII. 



Wilson 

Smith. 

Trevan and 
Brown. 

All Tests. 

No. of Tests . 

17* 

67 

84 

Mean b . 

2-48 

3-16 

3-02 

Total Variance of Slope . 

Average Variance of Estimate of 

0-1058 

1-1671 

1-0241 

Slope . 

0-1069 

0-6877 

0-5702 

Variance of Estimate of Mean 6 ... 

0-0063 

0-0103 

0-0068 

. 

0-0052 

0-4897 

0-4607 

Vfc . 

0-0721 

0-6998 

0-6788 


* These values were obtained from five double tests (numbers 1-5 above), 
and seven single tests. Numbers 6 and 7 above, which were dealt with by 
Oaddum in his report were not used. 




4:0 Irwin —Statistical Method [No. 1, 

There is clearly no significant evidence of real variation in slope 
in Wilson Smith’s tests. 

Equation ( 42 ) was now used to calculate the error taking 
w = 0 * 415 , y 2 — Vi = 0*13 (the average values in Wilson Smith’s 
tests), for Wilson Smith’s data and all the data together. 

The results were as follows :— 


Table IX. 




U 570 OM- 

Limits o£ Error 
(per cent.). 

Wilson Smith b = 2*48 *) 

<7* - 0 1 

W ass 0*415 I 

2/2 ' 2/1 ~ 0*13 J 

f 

0-0885 

0*2280 

59-169 

AU data 6 = 3*02 

<76 = 0*68 1 

w = 0*415 I 
Vz — Vi = 0*13 J 


0-0734 

0*1891 

65-155 


Thus, comparing the two sets of results for Wilson Smith’s data, 
calculating the error from an average slope is a slightly more accurate 
procedure than calculating it from the slope of a single experiment 
with xoo animals receiving the Standard Preparation and roo 
receiving the serum being tested. Using all available data, the use 
of an average slope gives a still smaller error, because Trevan and 
Brown’s slopes are greater, leading to a smaller standard deviation. 
The figure of 57-176 per cent, when 100 mice receive the Standard 
Preparation and 100 the preparation being tested has been proposed 
for insertion in the Pharmacopoeia and should be an outside estimate. 

(b) The method of intravenous injection into mice of the serum being 
tested , followed by the intmperitoneal injection of the test dose 
of the culture . 

The calculations are based on the data of Morgan and Petrie 
(ref. 22). 

First each of two comparative tests of two sera were used. In 
the former there were 70 mice receiving one serum and 60 receiving 
the other; in the latter there were 200 mice receiving each serum. 
The percentage of mice protected by each dose was converted into 
a normal equivalent deviation, or probit, and plotted against the 
logarithm of the dose to the base 10. 

Regression lines were calculated and the slopes estimated. The 
values of thus obtained were multiplied by V(130)/100 and 2 
respectively to obtain the value corresponding to 100 animals in all. 

The results were as follows :— 



1937] Applied to Biological Assays . 47 


Table X. 



om. 

2 576ajtf. 

Limits of Error 
(per cent) 

1 

0*1396 

0-3396 

44-229 

2 

0*0821 

0-2114 

61-163 


The average of these two values is 51-197 per cent. 

In view of the availability of further data comprising nine single 
tests, the situation was explored further. 

The average slope was obtained : (1) for the single tests, (2) for 
the above two comparative tests, and ( 3 ) for all tests. If an average 
value of the slope is used in equation ( 42 ) instead of the value 
actually obtained from the comparative test itself, a h 2 must be 
equated to the real variance in slope added to variance of estimate 
of the average (the latter quantity being a small correction). The 
real variance in slope may be obtained by deducting from the 
variance of the slopes of all available experiments the average 
variance of estimate. The results were as follows:— 


Table XI. 



Single Tests. 

Comparative 

Tests. 

1 

All Tests. 

No. of Tests . 

9 

2 

11 

Mean b . 

3*78 

2-65 

3*57 

Total Variance of Slope . 

Average Variance of Estimate of 

8*997 

1*209 | 

7*527 

Slope . 

4*737 

0*411 

3*950 

Variance of Estimate of Mean b ... 

0*526 1 

0*205 

0*359 

a* 2 . 

4*787 1 

1*004 

3*936 

°b . 

2*19 

1*00 

1*98 


Equation ( 42 ) was now used to calculate the error, taking 
w = 0 - 55 , — y x = 0 or 0 * 25 . The average value of the weighting 

coefficient in all these tests was 0*55, while — Vi was 0*073 in one 
of the comparative tests and 0*265 in the other. 

The results were as follows :— 


Table XII. 



OM- 

1 2 576<rjjf. 

Limits of Error 
t (per cent.). 

b — 3*57 ] 
a* - 1*98 

Vz - £1 = 0 J 


0*0770 

0*1985 

63-158 

6 = 3-57 | 
ib = 1-98 
Sa~ Si — 0-25J 

1 . 

0*0863 

0*2222 

60-167 






48 


Irwin —Statistical Method 


LNo. l, 


Thus working out the error with an average slope is a rather 
more accurate procedure than calculating it from the slope of an 
experiment with ioo animals in all The slope of the first com¬ 
parative test was much smaller than the average leading to a 
considerably larger standard deviation. The figure of 51-197 per 
cent., when 20 mice are used in each of the five groups, has been 
proposed for insertion in the Pharmacopoeia, and should be an 
outside estimate. 

It should be noticed that the second method gives much the 
same accuracy with 100 animals as the first does with 200, if we 
compare the errors calculated from an average slope. For 200 
animals in all the figure of 51-197 per cent, is lowered to 62- 
162 per cent., which is also somewhat more accurate than the value 
of 57-176 per cent, given by the first method. 

In conclusion, I must express my indebtedness to all my col¬ 
leagues on the Pharmacopoeia Commission’s Sub-Committee on the 
Accuracy of Biological Essays for much patience in exchanging 
points of view. The calculation of the error in the first test for 
Vitamin C was the work of Professor Gaddum, whose knowledge and 
experience have been invaluable throughout. Also I must acknow¬ 
ledge the help of my assistant, Miss Nancy Goodman, who has carried 
out much laborious arithmetical work. 


References. 

(1) Trevan, Proc. Roy. 80 c., B, 1927, 101, 483. 

(2) Gaddum, Med. Res. Council , Spec. Rep. Series, 1933, No. 183. 

(3) Marks, Quart. J. Pharm. Pharmacol. , 1932, 5, 255. 

(4) Hemmingsen, Quart. J. Pharm. Pharmacol., 1933, 6 , 39 and 187. 

(5) Tables Jor Statisticians and Biometricians, 1024, 1. 

( 6 ) Buss, Ann. Appl. Biol., 1935, 22,138. 

(7) Fisher, ibid., 1935, 22,164. 

(8) Bliss, ibid., 149, 151-2. 

(9) Gaddum, ibid., 21. 

(10) Buss, ibid., 154-6. 

(11) Gaddum, ibid., 22. 

(12) Fisher, Phil. Trans., 1922, A, 222, 319. 

(13) Hartley and Bruce White, Quart. Bull. Health Organisation, League of 

Nations, 1935, 4 , 13. 

(14) Key and Elphick, Biochem. J ., 1931, 25, 888. 

(15) Key and Morgan, ibid., 1933, 27, 1030. 

(16) Bourdillon, Bruce, Fisohman and Webster, Med. Res. Counc. Spec. 

Rep Series, 1931, No. 158. 

(17) Dyer, Quart. J. Pharm. Pharmacol., 1931, 4, 503. 

(18) Key and Morgan, Biochem. J 1932, 26, 196. 

(19) Hume, Pickersgill and Gaffikin, Biochem. J., 1932, 26, 488. 

(20) Gen. Med. Counc., Brit . Pharm. Comm. Rep. Comm., 1936, 9 , 56 and 60; 

10 , 6 and 8 . 

(21) Wilson Smith, J. Path. Bad., 1932, 35, 509. 

(22) Morgan and Petrie, Brit. J. Exp. Path., 1933,14, 323. 



1937] 


49 


Discussion on Dr. Irwin's Paper, 


Discussion on Dr. Irwin’s Paper. 

Professor Gaddum : I should like to thank Dr. Irwin for collect¬ 
ing together a large number of statistical methods that are likely to 
be useful to pharmacologists. I think that there are many people 
who will find it very useful to be able to refer to Dr. Irwin’s paper 
when they are trying to decide on what method is particularly suitable 
for their problems. I understand that I should be following the 
traditions of this Society if I could make some drastic criticisms of 
Dr. Irwin’s paper; unfortunately, it is difficult for me to do so. 
Many of the data which he has presented are based on the report of a 
Committee of the Pharmacopoeia Commission, of which I was a 
member, and I am jointly responsible, with Dr. Irwin and others, 
for a great deal of this work. I feel, therefore, that I ought to help 
to protect him from criticism, rather than to expose him to more 
criticism; but I hope that that criticism will be forthcoming, because 
I understand that one of his objects in reading this paper is to invite 
criticism from the statistical side. It is very important that the 
statistical methods used in these official calculations should be the 
best possible. 

Dr. Irwin’s paper reveals the fact that there is still one question 
on vhich we disagree. In calculating a regression line for data 
connecting probits with the logarithm of the dose, the obvious 
procedure is to fit a preliminary line and to calculate the weight 



of each observation, not from the observation itself, but from the 
observation as corrected by this preliminary line. This procedure 
has been recommended by Bliss ( 1935 ) and adopted by Irwin. It 
gives an approximation to the result of reducing x 2 to a minimum. 
A better approximation to the solution given by the method of 
maximum likdihood is obtained by calculating the weights directly 
from the original observation (Gaddum, 1933 ). The reasons for 
this may be summarized as follows : 

In the figure Y is true value and y the observed value of a variable. 
The continuous line is the probability distribution of y about Y . 
The dotted line h the “ curve of likelihood,” showing the likelihood 
of different values of Y when y is given. Neither curve is necessarily 
normal. 

The method of maximum likelihood involves finding the solution 
giving a maximum value of 8 { log/), or a minimum value of $(—log/). 



50 


Discussion 


[No. 1, 


The method of least squares, as usually applied, involves finding a 
solution giving a mi ni mum value to S{W(y— Y) 2 }, where W is the 
weight. A sufficient condition for the equivalence of these two 
methods is that — log/ = KW{y— Y) 2 , where K is a constant. 
If this condition is satisfied, /== er KW( !i- T) * } and the probability 
distribution is a normal curve with standard deviation (2 KW)-*. 
Such an argument can be used to justify the use of the method of 
least squares when the probability distribution is normal. There 
is no argument, based on likelihood or any other fundamental 
conception, which would justify the use of the method of least 
squares, with weights calculated from Y, when the probability 
distribution is not normal, or approximately normal. 

If y and Y are probits, the shapes of the curves can be calculated. 
The probability distribution is normal only when the number of 
animals used is infinite. With the numbers used in actual experi¬ 
ments the probability distribution is discontinuous, and grossly 
asymmetrical. In the method of calculation proposed_by Bliss, 

dY iPQ dY 

W is calculated from a F) which is taken as g p x jp or J n ^ dP’ 

Actually ay is infinite for finite values of n s since infinite values of 
Y correspond to finite probabilities. In any case, the application 
of the method of least squares in this form is unjustifiable, since the 
probability distribution is not normal. 

A possible method of evading this dilemma is based cn the 
surprising discovery that the curve of likelihood, unlike the probabi¬ 
lity distribution, is a very close approximation to a normal cmve with 


standard deviation 


With this 




~y*j- (Gaddum, 1933 ). 

1 _(»-22 
approximation f = —= e 
V27T 


or — log f — 


/ . _ y\2 

——2—~ + a constant. The solution given by the method of maxi- 

mum likelihood is that corresponding to the minimum value of 
&(—log/), which is the same thing as that corresponding to the 

minimum value of This solution can be obtained by 

calculating the weights from the observed points and proceeding with 
the calculation as in the ordinary method of least squares. There 
aTe thus two methods of applying the method of le*st squares, of 
which one is justified when the probability distribution is normal, 
and the other when the curve of likelihood is normal 

I think it may be interesting to this meeting to hear something 
of the history of the application of statistics to tiese methods of 
biological assay—a history that has tended to repeat itself every 
time a new biological test has been introduced. The old-fashioned 
way of measuring the toxicity of an unknown sulstance is to inject 
a series of different doses into each of a series d animals. If the 
doses are fairly widely spaced, there is no overlapping. The results 
are not very accurate, but not therefore useless. If you want to 
know whether a given hormone is mostly in tie precipitate or the 



1937] 


on Dr. Jr win h Paper. 


51 


supernatant fluid, tlie test may stow that there is much more 
in the precipitate than in the fluid, and that may be sufficient for 
your purpose. It was not realized at first that it was possible to 
make accurate estimates of toxicity, but in 1927 Dr. J. W. Trevan 
published an important paper which changed the situation entirely. 
He applied statistical methods to data of this kind, measuring the 
variability of the animals, and we all had to start learning statistics 
ourselves. He found that when the variation of the animals was 
measured by plotting the mortality against the dose, the variation 
was much larger than anyone had supposed. This aroused a great 
deal of indignation at first. Everyone said, “ Our animals are not 
like that; we never notice any discrepancies like yours.” The 
indignation was confined to bald statements of disagreement; when 
actual measurements were made, it was found that the variation 
really was large. This conclusion was eventually accepted by most 
people, and it was agreed that individual animals varied. A 44 50 
per cent, dose ” was introduced—a dose which would act on 50 
per cent, of rats, for example. It was soon found that populations 
of rats varied. The only way to overcome that was to adopt a 
standard of preparation. The Permanent Commission of the 
League of Nations on Biological Standardization has prepared 
standard preparations of drugs and other remedies, so that when 
someone in England speaks of a ** unit,” he means the same thing as 
someone in another country, and our Chairman, Dr. Hartley, has 
had more than anyone to do with the preparation of these standards 
for use all over the world. 

Dr. Trevan introduced methods of assay in which the variation 
was taken into account but the variation was assumed to be constant. 
It was found in time that the variability of the animal itself might 
vary; the slope of the curve is not constant. 

Another complication is that the shape of the curve is not constant. 
It is quite a close approximation to say that the logarithm of the 
susceptibility of individual animals is normally distributed; but if 
you can get enough animals and determine the curve accurately 
enough, it is generally found that that is not strictly true. 

In the measurements of insecticides it is quite easy to measure 
the shape of these curves accurately. It is not much more difficult 
to put fifty flies into a bottle than it is to put one, and the percentage 
mortality can be measured fairly accurately. On the other hand, a 
method has been used for measuring a hormone in the pituitary 
which causes ovulation in a rabbit after injection. In order to 
determine whether ovulation occurs in one rabbit, it is necessary 
to do an aseptic operation every day, opening the rabbit’s abdomen, 
and looking to see if it has ovulated or not. Several hours’ work is 
required, therefore, for each animal, and it would be impossible to 
measure the shape of the dose-effect curve accurately. It is necessary 
to compromise, and it is often sufficient to use very few animals. 

The application of these statistical methods helps to show how 
much accuracy can be obtained with a given number of animals, 
and so to give some sort of idea of how many animals have to be 
used*. 



52 


Discussion 


[No. 1, 


1 hope Dr. Irwin’s paper will help people to make this kind of 
calculation, and I have much pleasure in proposing a hearty vote of 
thanks to him. 

Dr. Coward : It gives me great pleasure to second this vote of 
thanks to Dr. Irwin. It really is not very long ago that biology and 
chemistry joined forces and produced a useful study of physiological 
chemistry, later called biochemistry, and more recently still biology 
and physics have joined forces. Now mathematics and biology 
are joining forces and producing biomathematics. 

It seems to me that one of the great advantages of this combination 
is to enlighten two sets of people. One set has a most pathetic faith 
in the accuracy of biological assays; the other set believes not at 
all in biological assays, but puts down any discrepancy between 
biological and chemical assays as due to the error of the biological 
assay. Really it is to convince the one set of people how much too 
great is their faith, and the other set of people how much too little 
is their faith in biological methods, that I think one of the great 
advantages of Dr. Irwin’s paper will be seen. 

I have great pleasure in seconding this vote of thanks. 

Mr. Yates also thanked Dr. Irwin for his excellent paper. He 
thought that comprehensive reviews of the statistical methods 
appertaining to particular subjects were extremely valuable to 
practical workers. Dr. Irwin’s paper was an excellent example of 
such a review—clear, lucid, and satisfy the test that it could be 
understood by workers who had no previous knowledge of statistics. 
He was particularly glad that Dr. Irwin had not confined himself to 
a description of statistical machinery, but had also given an account 
of the material on which the machinery was used. 

The main point on which he found himself differing from Dr. 
Irwin was in the matter of determining the gradient of response b and 
the standard error from a preliminary set of experiments, afterwards 
using the fixed values so determined for the actual assays. One of 
the great steps forward in agricultural experimental technique was 
that of allowing each experiment to determine its own error, and 
also to determine the magnitude of any corrections that might be 
made for concomitant observations such as plant number. Therefore 
he could not help feeling that all biological assays should be so arranged 
that at least two points on the response curve were included. This 
could easily be effected by using two concentrations of the standard 
preparation, so chosen that they might be relied on to bracket the 
response to the test preparation. If b were determined from one 
such experiment, the accuracy would clearly be lower than if b were 
accurately known, and did not change from experiment to experi¬ 
ment; but if this assumption of unchanging b were, in fact, correct, 
the results of previous experiments could be combined to determine 
this fixed value of b. Consequently no loss in precision would result. 
At the same time, confirmation (or otherwise) would be obtained as 
to the constancy of &, and the need for a preliminary and somewhat 
extensive set of experiments to determine b would be avoided. 



1937] 


on Dr. Irwin’s Paper, 


53 


Similar remarks were applicable to tbe determination of tbe 
standard errors. It bad been found extremely valuable in agri¬ 
cultural experiments on field crops to so design each, experiment that 
it provided its own estimate of error. The statisticians had had to 
fight a hard battle to gain this point, but he believed he was right 
in saying that it was now accepted by all the more progressive 
agricultural experimentalists. It would be a salutary first step 
in a movement for the improvement of the technique of experiments 
on animals if similar rigorous methods of randomization were 
adopted. His own contact with experimental work on animals 
indicated that (in agriculture, at any rate) there was far too much 
juggling with the arrangement in the interests of convenience and 
so-called accuracy. Quite apart from anything else, such juggling 
immediately opened the door to conscious or unconscious dishonesty 
on the part of the experimenter. 

He would also like to ask Dr. Irwin whether use of the 
^-distribution would not have avoided the involved statement 
of the standard error of the standard error on p. 35 . What 
about the standard error of the standard error of the standard 
error ? 

Mr. Bartlett wished to add his compliments to those accorded 
to Dr. Irwin for his very full and valuable paper. He would confine 
his actual remarks to the statistical side, and in particular to the 
method which he would call the probit log. dose method, with which 
he had some acquaintance, owing to its use in experimental work 
on the toxicity of fumigants to insects. 

If one considered the questions of x 2 > it seemed to him that 
Dr. Irwin did not perhaps stress enough the possible heterogeneity 
of the animals with which he was dealing. He mentioned that they 
could vary, but afterwards, if ever he got a significant x 2 > he seemed 
to consider that it was due to the regression line not being linear, 
whereas, in experiments with cruder stock it was very often the 
case that the linear relation appeared to be satisfied, but there was a 
highly significant value of x 2 > because even if normal, the population 
might not be sampled at random. Insects might be more or less 
grouped together in batches. He supposed that theoretically the 
possibility of bias could be overcome by taking a batch of insects 
and deciding afterwards what dose should be given them at random, 
but whether such a procedure was feasible would obviously depend 
very much on the practical nature of the problem. He raised this 
point because it was rather a difficulty. 

Another point was the question of the degrees of freedom for x 2 * 
Dr. Irwin criticized Bliss’s treatment of this, but he did not seem to 
have realized all the implications; one reason why the x 2 distribution 
became altered was the small number there might be in some groups. 
Thus if the number were so low as to give an expectation of one, and 
none were observed, the maximum contribution to x 2 on that side 
would be one; therefore the total x 2 would tend to be restricted in 
its variation, which would appear to lead to a diminution in the 
number of degrees of freedom which should be given. The point 



54 Discussion [No. 1, 

had not been investigated very much, but was rather more difficult 
than Dr. Irwin supposed. 

Another purely statistical point was that of fitting the regression 
line. Dr. Irwin suggested that he would prefer to fit by least squares, 
whereas Mr. Bartlett would prefer to fit by eye from a graph, chiefly 
in order to save time. In extenuation of this, he would say that if 
the points were close to a straight line, the line obtained from the 
graph was extraordinarily good if reasonable care were taken, and 
it was, after all, only intended as a preliminary estimate. 

With regard to Professor Gaddum’s query of what was the 
maximum likelihood estimate, Mr. Bartlett thought he would have 
agreed with Dr. Irwin that if the whole data were considered at 
once, the maximum likelihood method would not lead to using the 
weights in the observed values; but even if a maximum likelihood 
method ever had a bias, there was no reason why one could not 
correct for it, and the method of using observed weights did tend to 
give a slight bias. 

Mr. Bartlett said there was one general remark he would like to 
make, and that was that on referring to Dr. Bliss’s paper it would 
be seen that he gave tables of probits, etc., in his paper which had 
become extremely useful; the reprint he had himself had become so 
disreputable through constant use that he had had to have the tables 
photographed. He would make a general plea to statisticians who 
incorporated such tables in their papers that these should not be put 
in the body of the text unless arrangements were also made for them 
to be printed, say, as an appendix, in the same kind of way as 
statisticians published them in book form, or as Professor Fisher 
gave them at the end of his book, where they could be cut out and 
used separately from the book itself. 

There was a point in connection with Dr. Bliss’s paper which he 
would like to mention. In Bliss’s original treatment of the data 
re-discussed by Dr. Irwin there was a small technical mistake 
(p. 163 of Bliss’s paper) in regard to the number of degrees of freedom 
for t —which might perhaps be pointed out here. Since x 2 was 
insignificant, a theoretical form of the variance was used, and the 
value of t should be 1-96. 

Mr. Bartlett had not been able to follow Dr. Irwin’s statement on 
p. 33 in connection with Table II, that “ the amount of antitoxin that 
will neutralize a test dose of 4*4 milligrams is certainly between 0*9 
and 1-i units.” On looking at the Table, it would be seen that if 
the American solution was assumed equivalent to the International 
solution, it certainly was not so, and in any case there was no reason 
why a mouse selected subsequently should not possibly die at the 
1*1 antitoxin dose. 

Dr. Trevan said it was an enormous relief to find a professional 
statistician engaging himself in these problems; it badly wanted 
doing. He was sorry to see that dreadful term “ minimum lethal 
dose ” referred to in the paper. There had been more trouble over 
the use of that term than almost anything else. He thought that he 
had “ scotched ” it, but unfortunately it had cropped up again in a 



1937] 


on Dr. Irwitfs Payer. 


55 


statistician’s paper. The term was coined by Ehrlich, who considered 
it as the minimum dose which would kill a guinea-pig. At the time 
he was under the impression that all guinea-pigs were the same, and 
that a lethal dose for the one guinea-pig was the quantity that 
would kill another. His opinion had altered later, hut most of the 
minimum lethal doses collected in the literature were doses which 
were very poor approximations to the LD 50 . Dr. Trevan did suggest 
once that for the particular use for which the term minimum lethal 
dose had been used by Dr. Irwin, the term “ individual lethal dose ” 
would be better, and he felt very strongly that Dr. Irwin would add 
greatly to the confusion is he stuck to the term “ minimum lethal 
dose ” in his printed paper. 

Another difficulty was that one had to remember that a large 
number of biologists approached problems from the naturalist’s 
point of view, rather than from that of the mathematician, and Dr. 
Irwin had possibly no idea how incapable most biologists were of 
understanding anything but the simplest mathematical argument. 
This was partly training and partly psychological, and people could 
be divided into two classes—those who were mathematical, and those 
who were not. The real difficulty in the practical application was 
the difficulty that the unfortunate biologist had in understanding 
the language of mathematics. He himself had reached the stage 
in which he had got to the second moment of the distribution; he 
understood what a standard deviation was, and he had got quite a 
lot of fun out of that, but most biologists had not got past the 
first moment. For this reason he had himself been called a 
“ statistician ” by some of his biological friends ! There were a few 
biologists to whom even the mean was something of which they 
were suspicious. 

Was it possible to avoid the laborious arithmetic of fitting curves 
by the least square method by some development of Karber’s 
method, the working out of which only involved simple addition ? 
The error was not much greater than that of the methods of least 
squares, except in so far as one had to put up animals over the whole 
of the frequency distribution. By Karber’s method those who were 
repelled by the calculating machine, or by mathematical tables, 
could get some hope. Another reason for paying more attention to 
Karber’s method was that the problem often came to a pharmaco¬ 
logist in a form in which Karber’s method was the method of choice. 
Dr. Trevan said that, as no previous information as to the LD 50 or 
the slope was available, he had been working lately on the activity 
of certain substances in the ephedrine group, where there were 
twenty-four substances to be compared. The arithmetic of working 
out the LD 50 by the least square method would take too much time, 
but by Karber’s method he could get an estimate which would satisfy 
him. He would be grateful if Dr. Irwin could give some indication 
of how they could estimate the error of the estimate of X by the 
development of Karber’s method given in Gaddum’s book. 

Professor Burn said that Dr. Trevan had referred to the 
difficulty of the biologist in using these methods. The statisticians 



56 


Discussion 


fNo. 1, 


were responsible because they chose tbeir technical terms with so 
little care. There was an example of this on p. 3 of the paper, 
where Dr. Irwin said, ee The response in an individual animal may be 
a continuous variate.” All he meant was that the response in an 
individual animal might be one which could be measured. If that 
was said to a biologist, he understood it, but he would not be likely 
to understand a a continuous variate.” Professor Gaddum had 
talked about the “ quantal response,” when he meant a response that 
could not be measured. 

More ought to be done to try to determine experimental errors 
by actual experiments made jointly in different laboratories. A 
great deal of good could come from that if more efforts were made to 
arrange investigations of that kind. 

The Chairman said that before putting the vote he would like 
to say one word. As those present had seen and heard for them¬ 
selves, since the services of Dr. Irwin had been secured for the work on 
the Sub-Committee on the Accuracy of Biological Assays, he had 
not been idle. As Chairman of the Committee for which Dr. Irwin 
had been working, and more especially on behalf of the Fellow 
Members of that Committee as well as on behalf of the Pharmacopoeia 
Commission itself, he would like to say how much they all appreciated 
the devoted and unselfish way in which he had helped them to 
approach their problems, to see them from another angle, and help 
them to overcome their difficulties. 

The vote of thanks to Dr. Irwin was now put to the meeting and 
carried unanimously. 

Dr. J. 0 . Irwin : I must thank you all very much for the way 
in which you have received this paper and for the kind words you 
have just said about me. 1 was not expecting such a favourable 
reception as I have actually had, and I am very grateful for all the 
criticisms which have been forthcoming. As far as most of them are 
concerned, I should like to reply, as is customary, in the pages of the 
Supplement to the Journal , but there are one or two things which I 
should like to answer now. 

Referring to what Professor Bum said about the use of the 
word <e continuous variate,” I think it is an expression which most 
statisticians would understand and, after all, tms paper is appearing 
in a statistical journal. If I had been writing in a biological journal, 
it might have been advisable to put a footnote explaining what it 
meant. 

With regard to Dr. Trevan’s remarks on the use of the term 
cc minimum lethal dose,” historically speaking I stand corrected. I 
think that I have used the term in a logical way in this paper, and 
that anybody who had never heard of the term before would know 
what I meant by it. The trouble is that they might have heard of 
it before. I think Dr. Trevan is right, therefore, and the term 
ee individual lethal dose ” would be better and less open to confusion, 
and I shall make that change before publication. 



1937 ] 


on Dr. Irwin’s Paper. 


57 


With regard to Mr. Bartlett’s criticisms, I think I will leave these 
until I reply through the pages of the Supplement to the Journal. 

With regard to Mr. Yates’s remarks concerning the slope, when 
there is the slightest doubt we use two doses of the standard and two 
doses of the test preparation, and if he looks at the details dealing 
with anti-pneumococcus serum, he will see that the errors of each 
assay have been determined from the data of that assay only, and 
that the magnitude of those errors has been compared with the 
magnitude of the errors one would obtain by the alternative method of 
calculating the errors from the slope of past experience. There are 
two alternatives which have to be weighed one against the other. 
With a small number of animals one might prefer to use an estimate 
of slope obtained from past experience with the appropriate error 
of that slope. If the experiment has been fairly accurately done 
with a large number of animals, one might prefer the other method, 
but the possibility of significant changes in slope should not be 
ignored. That method is the better which leads to the smaller error. 

I will consider the other question raised about the fiducial limits 
of the mean, when I reply in writing. 

Coming finally to the remarks of Professor Gaddum, I feel rather 
guilty in not really having tackled the point that he raised before 
and tried to get to the bottom of the difference between those two 
methods of fitting the regression line. I will examine the question 
again, and try to reply to him at length in writing. 

I thank you all very much indeed for your criticism, and for the 
kind way in which you have received the paper. 

The following contribution was received after the meeting from 
Dr. Neyman 

Dr. Irwin has given us an excellent paper, and there is no doubt 
that taking into account and trying to estimate the variability of 
experimental animals means a great deal in the advancement of 
toxicology. It seems, however, that we may and we should go a 
little further. Not only the susceptibility of animals is subject to 
variation. The variability of experiments with injections of toxic or 
other lethal material may, and very probably always does, depend 
upon the variability of the dose actually injected, which was intended 
to be of a specified size. Logically we may expect three different 
situations. 

(1) The variability in susceptibility of experimental animals 
is negligible compared to the variability of the dose. 

(2) The effects of variability of both the dose injected and 
the susceptibility of experimental animals are comparable in 
their effects and 

( 3 ) The variability of the dose injected is negligible compared 
with that of the animals. 

Dr. Irwin seems to have dealt primarily with the third situation. 
It may be useful to point out that the two first mentioned may 
present themselves in practice, and that they can be treated 
statistically. In order to make this contribution as short as possible, 



58 


Discussion 


[No. 1, 


I shall mention only the results obtained with regard to the extreme 
situation (1). This is likely to occur when the lethality of the liquid 
injected is due to some highly virulent bacterium present in the 
liquid. In such cases the size of the individual minimum effective 
dose may be usefully measured by the actual number of bacteria 
present in the dose, and in some cases it may happen to be practically 
constant for all experimental organisms used. If this be so, then it 
is possible to deduce equations of the dosage-response curves corre¬ 
sponding to cases when the minimum effective dose is say X Q = 1, 
X 0 — 2,X 0 = 3 , etc. The curves of this kind have been deduced 
and plotted by Miss K. Iwaszkiewicz and myself in a paper where 
all the situations ( 1 ), (2) and ( 3 ) were discussed.* It was a more or 
less pioneer work, and there are many points in the theory which 
could now be bettered. However, it seems interesting that the 
methods developed may bring correct and useful results, even in 
spite of the somewhat crude assumption concerning the invariability 
of the individual minimum effective dose. One of the sets of 
experimental data discussed in our paper represents the death rates 
of mice injected with varying doses of a culture of pneumococcus, 
observed by Dr. G. F. Petrie of the Lister Institute. The examin¬ 
ation of these data leads us to the conclusion that (1) the minimum 
effective dose for all mice was X 0 = 1 pneumococcus and (2) that the 
concentration of the pneumococci in the original culture was 
X = 297*3 X 10 6 per c.c. In the same year G. F. Petrie and 
W. T. J. Morgan published a paper containing among other results 
the check of the above conclusion.! They were dealing with the same 
strain of bacterium, but with different sub-cultures of it. It follows 
that their new experiments could only confirm or disprove our con¬ 
clusion concerning the size of the minimum effective dose, not the 
concentration of the bacteria in the culture. 

The relevant data are compiled in the authors’ Table X, showing 
that out of 435 mice inoculated 187 mice died or at least showed 
external signs of illness. If our conclusion that the minimum effective 
dose was for all mice X 0 = 1 pneumoccus were true, then all the 
248 healthy survivors would not contain any pneumococci in their 
bodies at all. Drs. Petrie and Morgan killed the surviving mice 
and examined them bacteriologically. Our estimate of the minimum 
effective dose proved not to be exact, as they found that some of the 
healthy mice were actually infected, but the inaccuracy seems to be 
rather small: in fact, the number of infected healthy survivors 
was found to be only 13 out of 248, i.e. 5*2 per cent. The paper 
also provides evidence that the mice which proved to be non-infected 
could not have destroyed the pneumococci if they were contained 
in the dose injected. I think this result is rather encouraging, show¬ 
ing that the situation (1), and therefore (2), are not imaginary, and 
that, taking into account the variation in the lethal power of the 
dose injected, we may be able to carry the analysis further. In 

* K. Iwaszkiewicz and J. Neyman, “ Counting Virulent Bacteria and 
Particles of Virus,” Acta Biologiae Experimentalise Vol. VI, 1931. 

t G. P. Petrie and W. T. J. Morgan, “ A Quantitative Analysis of the 
Lethal Power of a Strain of Type I Pneumococcus,” British Journal of Experi¬ 
mental Pathology , Vol. XII, 1931, p. 447. 



1937] 


on Dr. Irwin's Paper , 


59 


particular, it seems possible to distinguish such cases where the same 
average lethality of two liquids is due (a) to a high concentration 
of bacteria with low virulence and (6) to a low concentration 
highly virulent bacteria. If the bacteria we are dealing with are 
recognizable, then the problem may be solved by other methods. 
But I understand that it is not always known what micro-organism 
is causing a particular infection. Then the methods mentioned may 
be useful. Of course, these methods could not be considered as 
fully developed; they require further analysis and improvement. 

I wish to join with the other contributors in complimenting Dr. 
Irwin on his very interesting paper. 

Dr. Irwin subsequently added the following :— 

I have re-examined the question of weighting raised by Professor 
G-addum. I agree with him that that method of weighting will be 
best which gives a result nearest to the maximum likelihood solution. 
By way of example, let us suppose that ioo units would produce a 
50 per cent, response, the true slope of the curve being 3 probits per 
tenfold increase of dose. The equation to the true curve is then 

r - 5 = 3 (x - 2) 

and the log-doses corresponding to 30 per cent, and 70 per cent, of 
responses are respectively x = 1 -8252 and x = 2*1748. Now, suppose 
that in a particular experiment two groups of 10 animals are put on 
each dose, and that the observed responses are 2 and 4 out of 10 at 
the lower dose and 6 and 8 out of x o at the higher dose. The maximum 
likelihood solution for the slope is b = 3. Using weights calculated 
from the observed responses gives b = 2*93. Pitting a line with 
this slope and re-calculating the weights (which are now all equal) 
gives b = 3*13, and this is the final approximation, since, all the 
weights being now equal, no further change can be made. 

This result, as far as it goes, confirms Professor Gaddum’s 
opinion. Using weights calculated from the observed responses 
gives a result closer to the maximum likelihood solution. But this 
does not quite settle the matter. There is reason to believe that 
this procedure will give lines which are on the average a little too flat; 
so that it may be that we have to choose between an estimate with 
relatively small variation but a slight bias, and an unbiassed estimate 
with a larger error. Further research into the point seems necessary. 
The matter could be settled if the maximum likelihood solution were 
readily obtainable, and it is quite possible to devise a method for 
this purpose. 

I have already answered Mr. Yates’s first criticism. His second 
is perhaps based on a misunderstanding. I was not concerned with 
limits of error for the mean of 1 x experiments, for which the £-distribu- 
tion would have provided the simplest form of statement, but with 
limits of error for one experiment when only 11 observations were 
available for estimating the variance. 

Mr. Bartlett has suggested the solution of his own difficulty 
arising from the danger that the population of animals may not be 
sampled at random. He is also quite correct in stating that the 



60 


Discussion on Dr. Irwin’s Paper. 


[No. 1, 


problem of the small expectation near to zero or the large ones near 
ioo per cent, response is not easy. The effect on the distribution 
requires further study. I must also thank him for calling attention 
to the somewhat loose statement in connection with Table II, p. 33, 
which I have corrected. 

I have adopted Dr. Trevan’s suggestion and substituted 
“ individual effective dose ” for “ minimum lethal dose ” through¬ 
out. 

Dr. Neyman’s contribution is most interesting. The first 
situation which he mentions, and the only one he discusses in his 
note, seems to belong more properly to the subject of bacterial 
counting than of biological assay. If we know the average number 
of bacteria in the dose of a highly virulent substance, and the number 
required to kill the animal, it is a straightforward application of the 
Poisson series to deduce the dosage-response curve. 

In most biological assay work, however, variability of the dose 
is not likely to be important compared with variability of the animal. 
In the assay of anti-pneumococcus serum, for instance, where the 
object is to balance the effects of a virulent culture and of the anti¬ 
serum, it may take a hundredfold increase in dose to cover the whole 
effective range between zero responses and xoo per cent, of responses. 

Nevertheless, the study of the intermediate situation envisaged 
by Dr. Neyman, where both effects are important, is clearly of the 
highest theoretical (and it may be practical) interest. 



1937] 


61 


Some Considerations op the Variability op Cotton Cloth 

Strength. 

By A. W. Bayes, M.Sc.Tech. 
of Messrs. Ashton Bros. & Co., Ltd., Hyde. 

[Read befoie the Industrial and Agricultural Research Section of the Royal 
Statistical Society, January 21st, 1937, Sir Percy Ashley, K.B.E., 
C.B., in the Chair.] 

( 1 ) Cloth Strength as a Measure of Quality . 

Tensile strength is the common measure of the quality of cotton 
fabrics. The quality of cloth for clothing and domestic purposes 
depends on such features as appearance and Cfi handle,” but these are 
difficult to put into figures, so comparisons are usually made of the 
tensile strength and of the details of construction—namely, the 
threads per inch and the yarn “ count,” * The strength of a fabric 
is dependent upon the strength of the yarns, the construction of the 
cloth, and the extent of chemical attach in finishing. In an open 
structure, the stronger the yarn the stronger the cloth, but in a close 
structure the stronger cloth may be made from the weaker yarn 
because the fibres can be held by twist, or by the interlacings of the 
threads, and the fibre strength is more efficiently used by inter¬ 
lacings than by twist. High tensile strength is hardly ever the 
quality chiefly required in cloth. Consider a few examples. In 
light furnishing fabrics, such as curtains, soft twisted yarns giving a 
soft handle and even appearance are required; overalls usually 
become unfit for use by a burst at a worn place or at a seam; in 
industrial cloths, even when a high tensile strength is required in the 
final product, the highest strength of, say, a laminated cloth and 
synthetic resin material is not always achieved by the strongest cloth. 
Lastly, as a test for chemical tendering in finishing, tensile strength is 
not nearly so satisfactory as the fluidity test . 1 On the other hand, 
the work of testing the quality of the cotton and the turns per inch 
is so slow that it is unsuitable for routine checking of cloth quality. 
So the tensile strength appears in most specifications, and is commonly 
used for checking the quality in the mill. 

The testing machine most commonly used stretches a strip of 
cloth to breaking point between two pairs of jaws. One jaw moves 
at a constant speed and the other is connected to a pendulum lever 
which loads the specimen as it is pulled by the traversing jaw. The 

* The “ count ” is the measure of fineness of yam. It is the length of yarn, 
measured in hanks of 840 yaids, that would weigh 1 lb. A yam “ count ” 
of 14s signifies that 14 x 840 yards of that yam would weigh 1 lb. 



62 


Bayes —Some Considerations of the 


[No. 1, 


jaw speed is usually 18 inches per minute. Tlie cloth sample tested 
is usually 6| inches or 7 inches long between the jaws, and from 
1 inch to 61 inches, but commonly 4 inches wide. 

Test strips a little wider than the width required are cut along the 
length of the cloth (warp way) and across the width (weft way) and 
the threads at the edges are frayed out till the sample is the correct 
width. The strength of cotton fabrics increases with the moisture 
content so the strips are “ conditioned ” at a standard temperature 
and humidity and tested in this standard atmosphere. The usual 
standards for the test room are 20° C. and 65 per cent, relative 
humidity, but as cloth when sampled usually contains less moisture 
than these conditions would give and takes a long time to come into 
equilibrium with the atmosphere, the standard condition of the 

•WEFT from 

the; shuttle 

(ADJACENT 
THREADS FROM 
THE SAME 
BOBBIN ) 


WARP FROM THE WEAVERS 
BEAM. ( ADJACENT THREADS 
FROM DIFFERENT BOBBiMS.) 

specimens can be obtained most conveniently by exposing them 
overnight to a higher humidity of say 76 per cent. 2 

Usually at the mill five strips warp way and five weft way are cut 
from one yard of cloth, and the cutting is done so that no threads 
occur in more than one strip. Sometimes three half-yards are cut 
from different pieces and two tests each way are made from each. 
The following are typical strength specifications:— 

(1) “ Specification T.C. 5 A, Cotton Fabrics, Government Dept., 
Specification for Textiles and Clothing. Issued by the Technical 
Co-ordinating Committee on Textiles and Clothing—August 1933 ” 
(amended May 1934 , refers to 66 different fabrics). 

Inspection and testing—Tests to determine conformity with 
the figures given in the Schedule will be made under controlled 
conditions of humidity and temperature. The following standard 
is prescribed:— 

65 to 70 per cent, relative humidity at 70° F. The test 
pieces will be conditioned in this manner for not less than 48 





1937] 


Variability of Cotton Cloth Strength. 


63 


hours prior to test, and tested without removal from these 
conditions. 

“ Schedule 




Width, 

including 

Weight 

per 

Lineal 

Yard. 

Oz. 

Ends and 
Picks per 

Tensde Strength * 

Filling 
not to 
Exceed 
Per 
cent. 

Description 

Patfc 
No. T. 

Selvedge, 

asm 

Inch. 



Size of 



1 Pattern. 
Inches. 

Warp. 

: 

Weft. 

Warp. 

Weft. 

Piece 

Tested. 

Bluette, No. 1 

500 

27 

7] 

44 

134 

210 

500 

B 

_ 

Calico, No. 1, 










Grey 

503 

31 

21 

64 

68 

160 

140 

C 

— 

Calico, No. 2, 









White 

504 

etc. 

35 

4 

54 

58 

140 

210 

B 



* Note :—Test taken on a machine of the Goodbrand type, power driven, 
having a constant rate of travel of 18 inches a minute. Size of piece tested :— 

(A) 2 inches X 6| inches; 

(B) 4 inches X 6| inches; 

(C) 6£ mches X 6f inches.” 


( 2 ) “ Specification for bleached plain cotton sheet ” (for a 
Steamship Company). 

“ Strength . The mean breaking load of the fabric shall not 
be less than xoo lb. in both the warp and the weft direction, when 
tested in the following manner :— 

Six specimens shall be cut in the direction of the warp and 
six in the direction of the weft. The size of specimen shall be 
2 inches wide and 7 inches between the jaws of the testing 
machine. 

The tests shall be made on a Goodbrand machine, the travel¬ 
ling jaw of which has a constant rate of traverse of 18 inches per 
minute. 

Before testing, the specimens shall be exposed for at least 
6 hours in an atmosphere of 65 to 70 per cent. Relative 
Humidity.” 

The first specification is much the worse. It does not state that 
the strength is in pounds, or whether the specified strength refers 
to a mean, or to individual strips, or whether it is a minimum. The 
cloth delivered must, however, comply with the specification so the 
manufacturer must see that the cloth strength is amply sufficient 
for the delivery to pass this ambiguous specification. 

The second specification is much better, but it still does not state 
how the test strips should be selected. Statistical analysis is needed 
to determine what specification conditions would give the manu- 





64 


Bates— Some Considerations of the 


[No. 1, 


facturer reasonable latitude in the difficult task of cloth production 
and yet safeguard the customer from abnormal fluctuations in 
quality, and to show what sampling methods are most appropriate 
for checking a delivery against a specification, and for checking the 
general quality of the mill production. I will devote my attention 
more to describing the problem than to solving it, first pointing out 
the likely sources of variation in the processes of cloth production, 



Fig. 1. 


The white portion of tlio “mixing” and the circles among the warp 
doffings represent the material which passes to other cloths than that one 
under consideration. 

and then providing so,me data. I hope that the statisticians con¬ 
tributing to the discussion will make suggestions regarding the 
solution of some of the questions raised. 

(2) Description of Cloth Manufacture. 

I will describe the processes of cloth manufacture as carried out 
at the mill which supplied the data of this paper, and take one 
cloth as an example, namely the cloth tested for Table I. The 
methods of manufacture described are similar to those in general 
use in the cotton trade, but the figures of quantities and rates of 




















































1937 ] Varu ' Colton Cloth Strength, 65 

Table I. 


Weekly Strength of Cloth during 1936 . 


Date 

of 

Warp Strength (»i in. 

strip) in lb. 

Weft Stiength (4 in. strip) m lb. 

Samp¬ 

ling. 

1 . 

2. 

3. 

4. 

5. 

1 . 

2. 

3. 

4. 

5. 

4.1 

625 

630 

600N 

575 

570N 

400 

395 

415N 

370 

410N 

10.1 

605 

596 

585 

595 

605 

405 

410 

385 

375 

420 

17.1 

585 

560 

580 

565N 

600N 

425 

405 

400 

375N 

395N 

25.1 

605 

595N 

595 

600 

575 

415 

400N 

400 

425 

405 

31.1 

535 

565 

600 

540 

605 

395 

395 

410 

425 

405 

8.2 

690N 

580 

585 

585 

610N 

425N 

370 

400 

395 

395N 

14.2 

600 

560 

575 

570N 

590 

400 

405 

415 

370N 

405 

21.2 

535 

580 

545 

565 

545 

415 

395 

395 

450 

420 

27.2 

635 

605N 

605 

595N 

625N 

455 

380N 

450 

400N 

435JI 

6.3 

600 

590 

645 

575N 

590 

450 

405 

415 

445N 

420 

13.3 

640 

620 

635N 

600 

605 

415 

420 

420N 

405 

430 

20.3 

645 

505 

555 

575 

575 

395 

445 

410 

365 

400 

27.3 

595N 

565 

595 

570 

590N 

385N 

365 

430 

400 

410N 

3.4 

515 

580 

560 

565 

560 

395 

390 

365 

350 

355 

9.4 

530 

510 

345N 

575 

570 

410 

400 

400N 

420 

370 

17.4 

590 

575 

545 

565 

545 

405 

410 

375 

375 

375 

24.4* 

520 

580 

515 

575 

550 

405 

410 

405 

290 

415 

1.5 

525 

550 

563 

590 

590 

410 

370 

335 

395 

400 

8.5 

565 

555 

570 

590 

585 

370 

345 

386 

390 

410 

15.5 

540 

580 

515 

495 

545 

405 

365 

390 

410 

380 

22.5 

540 

505N 

565 

580 

550N 

380 

35oN 

410 

350 

405N 

25.5 

520 

545 

560N 

550 

570N 

390 

350 

365N 

360 

360N 

1.6 

570 

600N 

585 

555 

525N 

375 

36 ON 

360 

370 

380N 

4.6 

555 

565N 

520 

570 

540N 

410 

400N 

410 

400 

390N 

12.6 

535 

553 

545 

560 

545 

385 

385 

390 

390 

330 

19.6 

565 

500N 

560N 

590 

580 

460 

405N 

430N 

385 

375 

26.6 

585N 

530N 

555N 

540N 

550N 

385N 

385N 

370N 

390N 

405N 

3.7 

545 

560 

555 

555 

560 

375 

405 

405 

390 

370 

10.7 

555 

535 

565 

570 

545 

400 

410 

370 

393 

365 

20.7 

550 

570 

590N 

593N 

590 

425 

410 

380N 

395N 

370 

24.7 

595N 

585 

575 

595N 

570 

440N 

405 

380 

415N 

395 

31.7 

610 

585N 

590 

565N 

585 

350 

370N 

405 

375N 

440 

7.8 

580 

605 

580 

605 

555N 

400 

410 

395 

395 

41 ON 

14.8 

610 

590 

595 

580 

580 

415 

435 

370 

425 

405 

21.8 

595 

560 

550 

550 

545 

430 

425 

380 

395 

425 

27.8 

600 

550N 

505 

600N 

550N 

425 

425N 

425 

420N 

41 ON 

4.9 

565 

625 

620 

600 

585 

400 

410 

400 

435 

425 

18.9 

570 

570 

570 

570N 

585 

405 

400 

335 

435N 

425 

25.9 

580 

530 

575 

585 

585N 

410 

410 

420 

350 

335N 

2.10 

540 

565 

540 

550 

585 

375 

375 

390 

370 

360 

9.10 

520 

520N 

545 

600 

540 

400 

405N 

395 

360 

405 

16.10 

575N 

535 

555N 

525 

560 

400N 

410 

405N 

410 

420 

23.10 

585N 

560 

555N 

575 

520N 

395N 

355 

405N 

400 

385N 

30.10 

560 

550N 

550 

540 

565 

410 

365N 

405 

405 

420 

6.11 

545N 

595 

545N 

585N 

545N 

390N 

370 

385N 

335N 

385N 

13.11 

485 

505 

525 

540 

545 

400 

380 

425 

380 

400 

20.11 

490N 

595N 

490 

535 

490 

400N 

400N 

390 

390 

370 

27.11 

530 

560 

550 

510 

570 

430 

375 

390 

390 

420 

4.12 

575 

555N 

585 

533 

535 

390 

400N 

375 

380 

395 

11.12 

605 

540 

535 

485N 

535 

380 

375 

370 

390N 

410 


* No note of the N’s was kept in the four weeks 24 4. and 1, 8 and 15.5 , 
and the stiips were not kept in their coirect order. 


CTTPT> VHT TV INTO 1. 


D 





66 Bayes —Some Consideration^ of the [No. 1, 

production quoted refer to the one cloth only. The spinning process 
produces a thin yarn from a mass of cotton fibres by several consecu¬ 
tive machines, each of which delivers the material on bobbins in 
batches termed “ doffings.” These bobbins are fed to the bobbins 
of the successive machine singly, so the cotton being fed to a machine 
at any instant comes from bobbins which vary in size from full to 
nearly empty, and also in this winding from bobbin to bobbin, the 
outside of one bobbin makes the inside of the next. The flow of 
material is therefore continually being pleated, so to speak, and the 
general result is a continuous flow which cannot be split into batches. 
After the yarn is spun, however, the production flow separates more 
obviously into batches. The flow of production is summarized 
diagrammatically in Fig. 1 . 

Cotton. 

The same “ mixing ” of raw cotton is used for warp and weft 01 
several cloths. A mixing consists of 20 bales of 480 lbs. each. 
Five mixings are used weekly, the waste loss is roughly 10 per cent., 
so the total yarn production from this mixing is 43,200 lbs. per week. 
Each mixing contains bales from four different “ marks ” or qualities 
of cotton in the proportion 4, 3, 2, x. The bales are delivered in 
100-bales lots; on the average, therefore, a 100-bale lot will last 
four weeks, but a change from one lot to another may occur in one 
or other of the marks each week. The variation in quality of cotton 
from bale to bale in one lot is considerable, and it is quite likely 
that some variation remains after 20 bales are mixed together. The 
flow of cotton from the bales to yarn is fairly steady, with the longest 
“ pleat ” no more than half a day long, so probably at least half of 
each 20-bale mixing reaches yarn without being mixed with the 
cotton from the previous or subsequent 20-bale mixings. 

Warp. 

The yarn “ count ” is 14s, that is, 14 hanks of 840 yards weigh 
x lb. 

Five spinning machines make a “ doffing ” of 422 bobbins of 
yarn each every 2 hours throughout the 48-hour week; total 120 
doffings/week. Each doffing varies around 50 lb. net weight of 
yarn. Each bobbin varies around 0-12 lb. net weight and holds 
about 1,400 yards of yarn. The yarn count is checked twice daily 
by measuring and weighing 120 yards of yarn from each of four 
bobbins from each frame. The standard deviation of these weights 
is of the order of 3! per cent. The spinning-machine wheels are 
changed to correct for changes in mean count of more than ± *£ 
per cent., but the testing-room humidity is not controlled, and 



1937 ] Variability of Cotton Cloth Strength. 67 

changes in weather probably result in real variations of the mean 
count of i 2 per cent. The yam strength is tested on the same 
120-yard lengths. The twist in the yarn is fixed by the wheels on 
the machine and, except for small variations in slip in the drives 
to the spinning-spindles, it is the same for all bobbins. 

The yam is wound on warpers’ bobbins on a winding-machine 
with ioo spindles. A doffing of spinning-bobbins is distributed over 
the whole ioo spindles and wound up, then a second doffing is dis¬ 
tributed and wound. One warper’s bobbin holds the yam from 8 
spinning-bobbins, so two “ doffings ” of 422 spinning-bobbins fill 
the set, and the inner halves of the 100 warners’ bobbins tend to come 
from one doffing and the outer halves fro$ another; possibly from 
another spinning-machine. The warpers’ bobbins are transported 
in “ skips ” of 70 to 80 bobbins to the next machine, the warping- 
"rame, where 500 bobbins are mounted on spindles in a “ creel ” 
and their yarns delivered together in the form of a sheet on to one 
large bobbin called the “ back beam.” The seven or eight skips 
that go to fill a creel are picked at random from the production, and 
though the bobbins from one skip are usually put together in the 
creel, the identity of the doffings is lost. The possible difference 
between the doffings represented at the inside and outside of the 
warpers’ bobbins persists in the back beam on individual ends, but 
for the whole beam the differences due to spinning-machines are 
effectively averaged, and only the possible difference due to time 
remains; the outer half of the back beam was probably spun about 
2| hours before the inner half. The whole beam represents roughly 
5 hours’ spinning production (or, rather, 80 per cent, of the spinning 
production of this count and quality, as 20 per cent, goes to make 
other cloths). 

Six “ back beams ” are then unwound together so that their 
respective sheets of yam are superimposed, making one sheet of 
3,000 threads. This sheet is “ sized ” with a boiling mixture of 
starch and tallow in water, dried, measured and wound on weavers’ 
beams. The cloth is required in “ pieces ” 80 yards long. The 
length of warp required to produce this length of cloth is 93-3 yards, 
so the sized sheet is marked every 93*3 yards. Four of these lengths 
fill one weaver’s beam, so one “ set ” of six back beams makes 30 
weavers’ beams. 

This superimposition of the back-beam sheets results in a com¬ 
plete mixing of the threads in the set, so any group of threads across 
the sheet should be like any other group. The possible difference 
due to the time of spinning persists, so the first 15 weavers’ beams 
may be slightly different from the last 15 if, for instance, a count 
control change was made in the spinning-frames between doffings. 



68 Bayes —Some Considerations of the [No. 1, 

There are usually about 80 looms weaving this cloth. A loom 
weaves up-the warp on the weaver’s beam in about 9 days, so on 
the average 9 or 10 looms require new full beams every day and at 
any instant the looms are weaving from all sizes of beams. 

The looms absorb a set of beams every 3J days, so the cloth 
produced on any one day may come from three different sets. The 
cloth is delivered as it is produced. So, usually, the pieces in a 
delivery will be from different looms and from different sets. A large 
trade is conducted in small orders, however, say of 50 pieces each, 
of various qualities of cloth differing only in details; a few ends to 
the inch more or less, or different counts of weft, etc, A consignment 
of these pieces would usually include the whole order, and so would 
be drawn from several weeks’ weaving, but all the pieces would come 
from one set of back beams and from only a few looms. 

Weft . 

The weft yarn count is 2-fold 22s, spun by twisting together two 
threads of single 22s yarn. Five spinning-machines spin the singles 
yarn from the same mixing of cotton in doffings of 422 bobbins 
weighing 50 lb. as before. This yarn is wound, two ends together, on 
to bobbins holding 2\ lb. of yarn on machines 18 spindles long, so, 
although the winding process is continuous, the yarn in both threads 
and often in the whole bobbin comes from one doffing. The 2j-lb. 
bobbins then take their place in five doubling machines of 340 
spindles each, which produce the weft ready for weaving on small weft 
bobbins holding 600 yards of yarn each. The 340 weft-bobbins are 
packed in three weft skips and taken to the looms. One spindle 
produces 2*1 lb. per week, so the 2^-lb. bobbin lasts 6 \ working days. 
A full 2^-lb. bobbin is put in the doubling-machine as a bobbin becomes 
empty, so at any instant all sizes of bobbins are represented in the 
machine, and one weft doffing includes yarn from at least 19 spinning 
doffings. Successive weft doffings should therefore be very similar, 
but a considerable difference between weft bobbins in a doffing is to 
be expected, and some difference between doffings from different 
doubling-machines may occur. The extra time taken to spin the 
finer count of the weft and the time taken in doubling makes the total 
processing time for weft yam about equal to that of warp, so that on 
the average the cloth is made from the same age of yarn both ways.* 
But the warp will be from three mixings and the weft from five. 

The weft bobbin fits in the shuttle and flies from side to side of the 
loom, interlacing with the warp threads to form the cloth. When the 
weft bobbin empties in the shuttle, it is replaced by a full bobbin, and 

* This fact in particular applies only to this cloth. Weft yarn is usually 
from more recent mixings than the warp with which it is woven. J 



1937] 


Variability of Cotton Cloth Strength . 


69 


so on throughout the warp. One weft-bobbin weaves 13 inches of 
cloth, so two weft skips, each holding 113 weft-bobbins, will make 
one piece of cloth. These skips will have been made at different 
times and possibly on different machines, so there is a chance that the 
weft strength of one half of the 4 4 piece ” will be different from the 
other half. 

When the length mark on the warp is woven into cloth, the piece 
is Gut off* measured and weighed, and the loom goes on weaving the 
next piece. The tension in the warp yarn in weaving is controlled by 
a weight and lever system which is adjusted manually and, though 
the loom will not weave with the warp very tight or very slack, there 
is a wide range of intermediate tensions at which the loom will weave 
but will produce pieces of varying lengths. It seems likely that some 
of the considerable variation in strength found between pieces 
sampled on one day may be caused by this variation in tension. A 
description of this length variation has been given elsewhere, 3 but the 
following summary may be useful for an understanding of the problem 
as it may affect cloth strength :— 

Pieces from one weaver’s beam vary in length. 

Some variation is associated with weavers’ beams (that is, with 
looms as set for one weaver’s beam). 

Small differences in design of loom make appreciable differences 
in the length of pieces of similar cloth made on different looms. 

There seems to be a day to day variation which may be caused 
by weather. 

There is certainly a month to month variation which is quite 
likely to be caused by the weather. 

On the average, pieces from all parts of the beam tend to be 
the same length, but pieces from the beginnings and ends of beams 
tend to vary more than pieces from the middles of beams. 

(3) Description of Data and some Problems . 

Fig. 2 and Table I (see p. 65 ) show, for the cloth described above, 
the control chart and corresponding test figures for 1936 . These are 
kept to see that the cloth conforms to the customer’s specification 
and as a check on the general quality of the production. The pro¬ 
duction was sampled by taking a 12-inch cutting of cloth from the 
end of each of five pieces taken from the looms on one day. One warp 
strip 12 inches long by 3| inches wide, and one weft strip 12 inches 
long by 4 inches wide,* were prepared from each cutting. These 10 
strips were exposed in a special box overnight to a relative humidity 

* The specified sample width is 4 inches for both warp and weft, but the 
warp samples had to be narrower to break within the capacity of the testing 
machine. 



70 Bates —Some Considerations of the [No. 1, 

of 76 per cent., removed one by one, and tested rapidly in an atmo¬ 
sphere of 65 per cent. R.H. 

The points in Fig. 2 are the means of the five cuttings taken on 
one day. The control levels are calculated from these figures, using 
the range of the five strips tested on one day, on the assumption 
that the within day variation is normal. The mean ranges are : 
warp, 54*4 lb.; weft, 49-6 lb. These figures were divided by 2*326 
to obtain the standard deviation, and the result divided by V5 and 


55£3 

5725 


WARP 


5315 
5tq 7 


WEFT 


3683 

F75 


JAM | FtS | Hft R | 


MAT | irun I JUt | AUG | SEP | OCT I MOV | DEC 


Fig. 2. 

Control Chart of Mean Cloth Strength from Bata of Table L 
-specified strength. 

-2-5% limit, calculated from specified strength ± 1*96 where a s is 

v5 

estimated from the mean range of the five strips in one sample. 

— 0*1% limit, calculated from specified strength ± 3*09 —L, where a a is 

V5 

estimated from the mean range of the five strips in one sample. 


multiplied by 1*96 and 3*09 to obtain the limits below (or above) 
which 1 in 40 and 1 in 1,000 means should fall. There is an obvious 
lack of control in both warp and weft figures. 

Some of the figures in Table I have a suffix N, which signifies that 
the cutting was taken from the first piece from a warp. The pieces 
for sampling are picked at random, and so, as there are four pieces 
on a warp, one quarter of the cuttings should be marked N. There 
should be 57 J cuttings marked N, and there are actually 58. Usually, 
the cuttings in one week’s sample marked 1ST will have come from the 
same set of back beams, so the variation in warp strength between 
N’s in one week should be less than the general within week variation. 














1937] 


Variability of Cotton Cloth Strength. 


71 


No such difference is to be expected in the weft tests. The warp N’s 
might be expected to show a change in warp quality 3J days before 
the average of the remaining figures would show it, but such changes 
are probably too slow for this to be demonstrated by the figures. As 
the arrangement of both warp and weft threads in the cloth is affected 
by the tension in weaving, some correlation in strength between warp 
and weft tests on the same piece might be expected. I have not been 
able to find any in the raw figures. 

Table II (see p. 72 ) gives the results of a special test of cloth 
strength of a coarse fabric of plain weave, which arose out of a 
complaint of low weft strength. At the end of February 1935 , 
half-yard cuttings were taken from 100 of the several hundred pieces 
held in stock. The cuttings were tested twice warp way and twice 
weft way on strips 4 inches wide x 6f inches between the jaws, 
after conditioning overnight in the humidity box as described above. 
The loom number, the date on which the piece was taken from the 
loom, and the piece length, were noted. These data are interesting 
because they show the variation that can be present in 100 pieces 
delivered from stock, and though by no means complete from the 
point of view of statistical analysis, they can be divided into two 
periods of production: October-November 1934 , and January- 
February 1935 , and collected under their respective loom numbers, 
so it should be possible to split the variance among several headings 
and to discover the important sources. The piece length can be 
controlled to some extent in the loom. It would be useful to know 
what increase in cloth strength and improvement in regularity of 
strength could be expected to result from a closer control of piece 
length. 

T Other problems arising out of these considerations are: 

(1) How should a manufacturer sample his production: 

(a) for a routine check on the general strength of his 
product; 

(b) to be sure that a delivery will pass the specification ? 

( 2 ) Experience has suggested that the distribution of tests 
from one piece of cloth is skew with a tail of weak breaks (see, 
for instance, Fabrics Co-ordinating Research Committee, 1 st 
Report , 4 which gives the following data for strength of 1,000 
strength tests on the warp of a 12/203 duck cloth: 

Class mean lb. 400 405 410 415 420 425 430 435 440 445 450 

Frequency ... 2 4 4 17 27 27" 50 49 68 76 95 

Class mean lb. 455 460 465 470 475 480 485 490 495 

Frequency ...113 125 105 97 73 38 10 10 1 total: 1,000) 



72 


Bayes —Some Considerations of the 


[No. 1 


Table II. 

Strength of 100 Pieces of Cloth Sampled from Stock on One Day . 


Made October-November 1934. 


Made Januarj -Pebruarj 1. 


Loom 

Ho, 

Length, 

yds. 

St length 
m lb. 

Length, 

yds. 

Strength 
m lb. 

Length, 

yds. 

Strength 
in lb. 

Length, 

yds. 

Strength 
in lb. 

Warp. 

Weft. 

Warp. 

Weft. 

Warp. 

Weft. 

Warp 

Weft. 

2 

109i 

285 

305 












280 

305 










4 

108 

265 

380 

1071 

245 

330 









255 

385 


275 

325 







5 







113 

305 

300 












300 

345 




10 

110 

260 

390 

lOGJ 

275 

330 

108 

300 

315 






280 

405 


290 

330 


305 

310 




15 

108 

300 

390 












275 

325 










18 







1071 

305 

315 












295 

345 




31 

105 

280 

360 

106 

275 

355 

108 

295 

335 






260 

375 


280 

365 


305 

355 




32 







1121 

260 

310 












290 

370 




36 

105 

285 

390 












280 

365 










39 







107 

285 

350 












205 

370 




47 







108 

310 

360 












290 

400 




48 

106 

260 

345 

1081 

270 

320 









280 

370 


275 

345 







70 

109 

295 

305 




106 

300 

340 

106 

280 

115 



280 

365 





285 

370 


270 

370 

70 







106 

240 

320 











1 

265 

370 




322 







1111 

310 

390 












300 

325 




133 

105$ 

280 

365 












275 

390 










139 







107 

290 

370 












300 

315 ! 




146 







107 

285 

370 




: 


i l 






275 

395 




149 







108} 

215 

360 












255 

370 




150 







1091 

320 

310 1 












286 

380 




152 

106V 

265 

370 

109 

295 

355 

1051 

285 

370 






280 

380 


265 

370 


310 

305 




151 







107 

255 

300 












300 

350 j 




159 

1071 

305 

385 

1101 

280 

420 









300 

370 


270 

380 







159 

104J 

280 

380 












290 

370 










160 

107 

275 

370 




1071 

275 

380 






275 

385 





280 

355 




165 

106 

255 

380 

107 

275 

350 









285 

385 


260 

385 







168 

110 

295 

325 












295 

330 










169 

1091 

310 

350 

103 

270 

320 









295 

345 


215 

320 







180 

106V 

285 

260 

106 

310 

235 

1091 

275 

365 






290 

390 


305 

400 


295 

255 




181 







104V 

300 

370 

108 

285 

370 









265 

380 


250 

330 

182 







105 

280 

375 

106 

285 

380 









290 

355 


280 

395 

183 







109 

275 

315 












280 

335 




188 

110 

265 

365 




1051 

295 

375 

109 

310 

3M0 



280 

315 





305 

375 


27*1 

<>55 

188 







116 

285 

870 












295 

340 







1937 ] 


Variability of Cotton Cloth Strength. 
Table II — continued . 


73 


Made October-November 1934, 


Made Janna ry-Eebruary 1. 


Loom 

No, 

Length, 

yds. 

Strength 
in lb. 

Length, 

yds. 

Strength 
m lb. 

Length, 

yds. 

Strength 
in lb. 

Length, 

yds. 

Strength 
in lb. 

Warp. 

Weft. 

Warp. 

Weft. 

Warp. 

Weft. 

Warp. 

Weft. 

193 







110} 

285 

355 

107} 

265 

375 









290 

380 


305 

385 

195 







107 

285 

375 












290 

345 




196 







106 V 

280 

380 












270 

350 




198 







107 

260 

360 












290 

320 




199 

107 

290 

345 




107 

260 

360 






275 

385 





260 

310 




201 

107 

280 

360 












290 

355 










202 

108 

275 

350 




107 

280 

390 

106} 

280 

360 



265 

355 





295 

385 


290 

350 

204 







106 

230 

350 












285 

350 


: 


206 







109 

305 

365 












300 

365 




211 







107 

275 

130 

112 

280 

390 









270 

380 


295 

345 

223 

105V 

305 

255 




10GV 

300 

355 






275 

305 





265 

290 




224 







108 

300 

405 












290 

375 




231 

110 

290 

410 

108 

290 

375 









310 

405 


285 

390 







235 

108 

295 

390 




110 

295 

380 






300 

385 





280 

350 




236 

105 

275 

355 




110 

265 

330 






290 

355 





290 

340 




240 

108 

280 

350 












265 

300 










245 







105 

295 

360 












285 

330 




216 

106 

265 

340 












270 

325 










248 







104 i 

300 

340 












305 

335 




219 







108 

250 

340 












290 

305 




251 







107 

275 

365 












300 

360 




252 

107 

260 

375 












300 

325 










258 

108 

280 

120 












290 

385 










260 ! 







109 

285 

400 












265 

100 




261 







110 

260 

355 












255 

355 




262 

_ 

265 

370 












275 

200 










263 

115 

290 

390 












270 

370 










266 







107 

285 

330 












305 

330 




267 j 

_ 

295 

415 

109 i 

300 

395 









255 

405 


310 

275 







269 | 







108} 

295 

415 












300 

395 




286 

105} 

270 

350 




104 

370 

360 






275 

360 





270 

355 




289 ! 

107 

305 

375 












295 

370 










290 i 

108 

270 

345 




107 

280 

350 






310 

350 





280 

345 




335 

106 

255 

380 












263 

385 











D 2 





74 Bayes —Some Considerations of the [No. 1, 

Does not even this degree of skewness make it inadvisable to 
specify a minimum strength for a single test specimen? But 
this type of specification is used in the British Standards Insti¬ 
tution specification 6 F .1 4 -oz. Aeroplane linen and elsewhere. 
Testing 1,000 strips of cloth from one piece is laborious, but if 
the shape of the within piece frequency distribution could be 
deduced from series of, say, five tests from a piece, the data 
could be obtained directly from routine mill records such as 
Table I. 

The mean of 5 tests from a cutting should be approximately 
normally distributed, but the data of Table I suggest that weekly 
means from 5 cuttings are not normally distributed, and it seems 
quite likely that the variation between cuttings is not distributed 
normally either. If this is so, how should control limits be 
calculated ? 

( 3 ) The tests from new warps on one day are likely to be 
from one set of back beams, so the warp tests, at least, should 
be more alike than other tests. A manufacturer could sample 
his production fairly conveniently by taking a strip from the 
first piece woven from each beam or a strip from each end of 
that piece. Would this provide a more efficient sample than one 
taken at random ? 

Table III. 



Waip Tests. 

Weft Tests. 


Method 1. 

Method 2. 

Method 1. 

Method 2. 

Period of tests . 

1932-34 

1935-36 

1932-34 

1935-36 

Number of weekly means. 

113 

89 

113 

89 

Mean tensile strength, lb. 

647-05 

660-51 

389-75 

388*88 

Mean range, lb. 

Within sample variance *. 

Variance of sample means, esti¬ 

48-93 

63-01 

38-31 

51-46 

442-7 

733-8 

271-3 

489-5 





mated from mean range f 

88-4 

146-7 

54-0 

97-8 

Variance of sample means, cal¬ 
culated from the distribution of 





means . 

1082-0 

680-7 

346-0 

198-3 


* The within sample variance was found by dividing the mean range by 
Tippett’s factor, 2*326, and squaring the result. 

f This line was obtained by dividing the figures in the line above by 5. 


( 4 ) Consideration of Two Methods of Routine Sampling . 

Table III summarizes the results of two series of routine tests 
made on the same quality of cloth as supplied the data for Table I. 
Tin the first series of tests made between 18 . 5.32 and 5 . 11.34 (method 1 
1 sampling), 1 yard of cloth was cut from a piece each week and' 














1937] 


Variability of Cotton Cloth Strength . 


75 


5 strips warp way and 5 weft way were tested from it. In the second 
series of tests made between 23.2.35 and 30.10.36 (which includes 
most of Table I, with the warp figures increased in the proportion 
3 ^ to 4 to correct for a change of strip width in 1935), one 12 -inch cut¬ 
ting was taken from each of five pieces each week and one strip warp 
way and one weft way were tested from each (method 2 sampling). 
In both series the means of the five tests and the range of the group 
of 5 tests were calculated. 

Let ct ^ 2 = true variance between weeks, 

= true variance within a day, but between pieces taken 
within a day, 

c r a = true variance within samples taken from a test cutting. 

Then denoting “ is estimated by 55 by an arrow, we obtain from 
method 1 : 


Warp. 

Weft. 

Degrees of 
Freedom. 


oy s ->■ 442-7 

271*3 

112 . 

• (i) 

5{aJ + c, 2 ) + a r - -> 5410-0 

1730-0 

452 . 

- (2) 

and from method 2 we get: 

(a r 2 -f 733-8 

5a* 2 + (o> 2 + a, 2 ) -> 3403-5 

489-5 

88 . 

• (3) 

991-6 

356 . 

■ (4) 

Warp. Weft. 

From. (1) <j r 21-04 lb.; 16-47 lb. 

from (1) & (3) a, 17-06 „ 14-77 „ 

from ( 3 ) & ( 4 ) <r tt 23-11 „ 10-03 „ 





Substituting these values in ( 2 ) the agreement is not exact. It 
would seem that there was more variation in warp strength and less 
in weft strength between pieces in the period 1932-34 than in the 
period 1935-36. 

We have four estimation equations for three sigmas. What is 
the best combination of these sigmas for estimating (a) control 
limits for routine tests, and (b) the control level for which the manu¬ 
facturer should aim when producing cloth to meet a specified mini¬ 
mum mean strength ? 

If cloth manufacturing were statistically controlled there would 
be no variation in mean strength from week to week with Method 2 
sampling. a w is therefore a measure of the lack of control. Control 
charts of cloth strength should show when the mean strengths are 
varying more than would be expected from the method of sampling 
used, so the control limits should be calculated from o r , or from a r 
and a s when the sample consists of cuttings from different pieces. 
The control level, or average strength, for which the manufacturer 
should aim, should be high enough to include the week-to-week 
variation if he is to be sure of supplying satisfactory cloth all the 
time. The control level should therefore be calculated from cr r , a s 



76 


Bayes —Some Considerations of the 


[No. 1, 


and combined in proportions depending upon the sampling method 
employed. The following discussion may be elementary, but the 
importance of the method of sampling is not generally realized 
either by those who draw up specifications or by those who test the 
cloth. The simplest sampling method is to take one test strip from 
one cutting of cloth but, in view of the variation found among 
single strips, it is usual to test five or six strips. These strips may 
be taken from one cutting or from several cuttings. The most usual 
method is to take all the strips from one cutting (method 1); one 
important testing house prefers to take two strips from each of three 

S.D* 

6 
5 
4 
3 

a 

i 

o 

CUTT1MQ5 * 10 q$765-43ailltl||l|l 
CUTTUiq PER lll| l | l 1 ll£34-?678qio 

MORE CUTTINGS 4——► MORE STRIPS 

PER CUTTING 

Pig. 3. 

cuttings when possible. The sampling method should be chosen to 
give the most accurate estimate of the mean strength with a minimum 
amount of testing. The accuracy of this estimate is given by the 
standard deviation of the mean, calculated from: 



Standard deviation of mean = 



where m = number of cuttings tested, 

n = number of strips tested per cutting. 

Fig. 3 shows the effect on the standard deviation of the mean of 
three methods of sampling when c r and a s are equal to the figures 
quoted above. In the right half of the diagram the usual method 



1937] 


Variability of Cotton Cloth Strength. 


77 


of sampling, taking several tests from one cutting, gives a con¬ 
siderable reduction in the standard deviation of the mean up to 
about four tests, but after that each additional test brings only a 
very small reduction. In the left half of the diagram the curves 
drop much more steeply, and show the great superiority of sampling 
one strip from each of several cuttings. Comparable points for the 
testing-house method of sampling, taking two strips from three 
cuttings,plotted as a circle and square for warp and weft respectively, 
show that this method gives a standard deviation of the mean of 
six strips very little less than that given by four strips from different 
cuttings. It seems to me that, as it is a simple matter to take a 
cutting from a piece of cloth, the testing-house method of sampling 
is-not so good as Method 2, either for routine checking of quality or 
for testing individual deliveries. 

Specification T.C.5A, quoted earlier, states the number of threads 
per inch, warp and weft way, and the weight per lineal yard in the 
cloth as delivered. The manufacturer has to estimate the count 
and quality of warp and weft that will give the weight and strength 
required. He may summarize his previous experience by reducing 
his raw cloth strength data to a figure more or less independent of 
the cloth structure, finding first the strength per thread tested, then 
multiplying this by the count of yarn. But having made this 
calculation and made what allowances he can for the effect of finish¬ 
ing, etc., the problem remains of how much he should allow for 
random variations, and finally for the usual lack of statistical control. 
Take, for instance, the warp-strength standard deviations calculated 
from Table III. The specified strength is 631 lb. Suppose this 
customer insists upon the maintenance of this minimum strength 
for means of 5 tests in 39 out of 40 means. Assuming normality as 
an approximation, the manufacturer should calculate the mean 
warp strength for which he should aim by adding 1*96 X S.D.x to 
the specified strength (where S.D.x is the standard deviation of the 
mean). But what value will he take for S.D.x \ 

(a) He may assume that the variation to be expected in the 
bulk is as he finds it to be within a piece, and so take : 

91.04. 

c r = 21-04 lb., S.D.x = —7=- = 9 - 40 , so control 

level = 649-4 lb. ( 5 ) 

(h) Or he may know about the piece to piece variation and 
include this by calculating: 

S.D.x — Vj{ 5 (a/ + CT^ 2 ) + <r r 2 } = 30 - 23 , so control 

level = 690-3 lb. (6) 



78 


Bayes — Some Considerations of the 


[No. 1, 


(c) But then, having discovered the variation between pieces, 
he might take his tests from five different pieces, and then use 

S.D.z = Vl(a/ + cf) = 12-11, so control 

level = 654-7 lb. ( 7 ) 

(d) But he still should have included the week to week varia¬ 
tion by calculating 

S.D.x = Vi( 5 a w 2 + a 5 2 + <j r 2 ) = 26 - 09 , so control 

level = 682-1 lb. (8) 

The grand means of both periods given in Table III come well 
below the control levels 690-3 lb. for Method 1 sampling, and 687-8 lb. 
for Method 2 sampling, and 24 per cent, and 16 per cent., respectively, 
of the sample means come below 631 lb., but only 5 sample means in 
113 in Method 1 and 1 mean in 89 in Method 2 occur more than 
1-96 X S.D.x below the grand means, that is, below 587-8 lb. and 
603-7 lb., respectively, which agrees quite well with the theoretical 
per cent, which the limit should exclude. 

The final estimate of the control level, (8), is 8-x per cent, above 
the specification and 3*3 per cent, above the mean of a year’s routine 
tests, but none of the cloth delivered has been rejected for low 
strength, so the customer evidently allows a larger proportion than 
1 mean in 40 to occur below the specified strength. It seems to be 
fairly general for the customer to allow some latitude in the tensile 
strength, but until this latitude is described in the specification there 
is not very much point in having a specification. Much work requires 
to be done to determine whether the variation of cloth strength is 
generally as great as in the examples quoted, but I believe that the 
work will be less and the results more conclusive if members of this 
Section will turn their attention to the matter and suggest how these 
data should be collected. 

(5) Practical Difficulties with Mill Data . 

There is always the danger that the workman who makes records 
or does sampling as a part of his work may discover that his over¬ 
looker is satisfied with the appearance of correct recording and has 
not the time or inclination to check it. A workman must think for 
himself most of the time, but when he starts thinking for himself about 
sampling or testing methods, he makes a great deal of trouble for the 
technician. The following four examples have actually occurred in 
the mill during the past year, and may serve to stress the importance 
of checking the consistency of mill statistics before using them. 



1937] 


Variability of Cotton Cloth Strength . 


79 


(а) A control chart of cloth strength showed a large difference 
in weft strength between two five-test means. The sampling was 
nominally by Method 2 : i.e., one test from each of five cuttings 
from different pieces, but on closer examination of the samples it 
was obvious that each set of five had been cut from one piece and 
marked with different loom numbers. The operative had reverted 
to Method 1 sampling (except for the false loom numbers), for 
which wider control limits should have been applied. 

(б) During the collection of data for calculating the variability 
of piece length of a cloth, several pieces averaging 86 yards long 
occurred in 2 weeks* records among the usual run of pieces which 
averaged 80 yards with a standard deviation of length of 1*12 
yards. No mistake had been reported, and the instruction card 
showed no shortage of pieces from the set and appeared to be 
correctly filled in. Investigation in the mill proved that the 
operative in the sizing machine had put the wrong wheels in the 
length measuring and marking device, and when he discovered his 
mistake after losing 200 yards in overlengths he arranged with the 
winding overlooker to make the next set of back beams 200 yards 
longer, kept the instruction card back till the second set was sized 
and then falsified both sets of figures. 

(c) About the same time, back beam weights in one quality 
were showing that the warp yarn was fine, and the test records 
showed that the spinning-machine draft wheels were being 
changed to make the yarn coarser, but the yarn remained fine. 
The test records were kept on cards in such a way that the over¬ 
looker could not see the tests or wheel records for more than two 
half-days back, so he could not see from the records, without 
taking special trouble, that the wheels were changed from 58 to 
59 each day (and presumably changed themselves back from 
59 to 58 overnight). 

( d ) Remarkable regularity of count was noticed in the records 
from one mill testing-room, so, as the spinning-machines were 
tested by weighing one lea from each of four bobbins from each 
machine daily, the standard deviation of the means was cal¬ 
culated from the recorded means and also estimated from the 
variation between leas. The estimate was twenty times greater 
than the recorded means indicated, and parallel tests in the 
laboratory gave a standard deviation of the mean more than twice 
as large as the estimate. The tester’s subjective attitude to his 
work was confirmed by the proportion of odd numbers to even 
numbers in his lea strength figures, which was x 1: 89. The limits 
set by the management were so close that strict application of the 
limits to the results of strictly objective testing would have resulted 



80 


Dismission 


[No. 1, 


in the draft wheels in 55 per cent, of the machines being changed 
every day. The tester had preferred falsifying his figures to 
demonstrating the absurdity of the testing routine. 

The author’s thanks are due to the Directors of Messrs. Ashton 
Brothers & Co., Ltd., for permission to present this paper. 


j References. 

1 Clibbens and Little, Tournal of the Textile Institute, XXVII, t285-t304. 

8 Bayes, ibid., XXVI, t120-t122. 

3 Idem , ibid., XXVII, t53-t83. 

* Department of Industrial and Scientific Research, Fabrics Co-ordinating 
Research Committee, 1st Report. R.M. Stationery Office, 1925. 


Discussion on Mb. Bayes’s Papek. 

Mr. Tippett : It gives me very great pleasure to propose a vote 
of thanks to Mr. Bayes for coming here and reading his paper to us 
this evening. I think it is probably true to say that most of the 
statistical methods that we use nowadays have been developed, not 
as a result of pure theory, but to meet certain definite practical 
needs. Probably what might be called the older classical statistics, 
using correlations and so forth, and developed by the late Professor 
Karl Pearson and his school, were to meet the needs of the bio¬ 
metricians. At later times we have had methods evoked by the 
needs of agriculture and due largely to Professor Fisher, and still 
more recently we have had another weapon in control charts designed 
to meet the particular needs of industry. 

All these methods, although they have the same fundamental 
mathematical basis, are given a definite twist by the subject to 
which they are applied, and the general body of statistical method 
lacks reality unless the people who develop those methods at Univer¬ 
sities and places of that kind can get first-hand knowledge of the 
various fields of application. It is one duty of this Society to supply 
the link between theory and practice, and we have to thank Mr. 
Bayes to-night for his help in bringing before us a cogent example 
of an industrial field to which statistical methods apply. 

Mr. Bayes has been good enough not to select from his experience 
a few nice things that fit into some pretty scheme or theory; he has 
thrown all his data at us and said, “ Do what you can with that,” 
and I think we appreciate his doing so. 

There are one or two remarks I should like to make, first as 
regards the control chart. It shows rather alarming variations and 
lack of control, as Mr. Bayes has pointed out. I should like to ask 
what he does, and what the people at the factory do, about these 
things \ I once had a talk with a spinner who used to put on the 
walls of his office testing figures taken from yarn in somewhat the 



1937] 


on Mr. Bayes's Paper . 


81 


same kind of way as a control chart, of which he was very proud. I 
asked him what he did with these figures, and he replied that he did 
not do much, hut he liked to have them there to see how things were 
going on, and I think probably it did keep his people up to scratch 
to know that they were being watched. It may have had another 
function for the producer. Many of the complaints that come in 
about the quality of the cloth or material are not objective complaints 
based on quality alone, but are occasioned by a change in the market. 
If prices are rising, the merchant will take goods of appreciably 
lower quality; if the market is falling, he gets very critical and 
makes complaints. If a manufacturer has his control chart before 
him, it may or may not show a lack of control, but at any rate he can 
see whether any particular batch about which he has received a 
complaint is really abnormally bad or not, and he may be able to 
stiffen his attitude in meeting the complaint. If a control chart has 
to be used for that purpose, I suggest that instead of the chart in 
Pig. 2 , with control limits calculated from the variations within a 
week, Mr. Bayes might care to consider new and wider control 
limits calculated from the variation between the weekly means. It 
would look a little less-heart-breaking to the mill staff. 

I am glad he has given a section on e * Considerations of two 
methods of routine sampling. 1 ’ In industry, generally speaking, the 
statistician does tend to require a little more sampling than the 
technical man is willing to give; but it is not always realized that the 
amount of sampling that is done is seldom used to the best advantage, 
and here in Pig. 3 we see that a Testing-House sampling method 
requires six tests to produce a result that would have been given by 
four tests obtained by the best method. To the statistician it is 
obvious that if one is going to sample a batch, the more widely one 
can spread his nets and the fewer tests there are to any individual 
unit, the more accurate will be the sampling. Although that is 
obvious to the statistician, it is not obvious to other workers like 
chemists and engineers, and I am glad Mr. Bayes has brought 
forward that point. There is a great deal of work still to be done in 
developing what may be termed the economics of sampling. 

I have great pleasure in proposing the vote of thanks to Mr. 
Bayes for the valuable paper he has given. 

Professor Pearson : I should first like to join with Mr. Tippett 
in congratulating Mr. Bayes warmly for the paper he has given us. 
It covers a great deal of interesting ground and raises far more 
questions which are worth looking into, from the point of view of the 
industry, than any of us have been able to deal with in the limited 
time that has been available. The most important problem that 
Mr. Bayes has put before us is essentially different from that presented 
in the last two papers dealing with industrial questions that have 
been before this Section: Budding’s and Jennett’s paper on the manu¬ 
facture of Electric Lamps and Hampton’s and Gould’s paper on 
Spectacle Glass Manufacture. In those cases we were concerned 
with planned experiments inside a Works, carried out in an endeavour 
to improve quality; in the present case I think that Mr. Bayes is 



82 Discussion [No. 1, 

most anxious to obtain suggestions as to the sampling routine which 
will best serve two purposes : 

1 . To give the manufacturer a picture of his general level of 

quality. 

2 . To determine how that level should be fixed so that tests 
carried out by the consumer will not lead to rejection of the 
material, i.e. to dissatisfaction. 

Having regard to the complex process of manufacture which we 
have heard described, it is not easy to lay down any very clear testing 


r DIAGRAM ILLUSTRATING VARIATION IN CLOTH STRENGTH. 



routine straight off, but I feel that many of the difficulties could be 
overcome if there were sufficient co-operation between the producer 
and the user of the cloth. 

In the first place, I think it would be good for the producer and 
consumer to have before them not only a chart of means like that 
shown in Fig. 2, but one which gave also the individual values for 
tests from the same piece and different pieces made in the same week. 
Fig. 4 , based on Mr. Bayes’s Tables I and III, suggests the kind of 
picture I have m mind, showing the variation in strength within 
pieces, between pieces and between weeks. With this as a typical 
picture of the kind of variation to be expected in cloth manufacture. 




1937] 


on Mr. Bayes’s Pape). 


83 


one must ask the consumer to try to define what quality character¬ 
istics he considers desirable, e,g. does he want: 

(a) The average strength to be above a certain level ? 

(b) The minimum strength to be above a certain level ? 

(c) Uniformity in strength, and if so how would he measure 

this % 

When, and only when, these points have been settled, is it possible 
for the statistician—or, for that matter, for any one else, since the 
problem is essentially statistical—to suggest rational sampling 
procedures to be carried out: 

(a) by the consumer, to check up on quality; 

(b) by the producer, to make sure that his level of production 

is up to the required standard. 

It is possible, for example, to deal with uniformity, once the 
standard desired has been defined. A problem of this kind has 
arisen and been solved tentatively in the specification of quality of 
electric lamps (B.S.S., No. 161 — 1934 ). Here there is some simi¬ 
larity with the cloth problem, since the variation in quality between 
different batches of lamps is greater than that within the batches; 
further, the consumer may be receiving lamps from several batches. 

If the large-scale purchaser of cloth cannot define precisely what 
I he needs, and carries out sampling tests which may be not only 
inadequate but irrational, it is not surprising that the producer is 
in a rather difficult position. Given a specification which does not 
state whether the requirements related to a mean or to a minimum, 
the level of production he ought to maintain is just a matter of guess¬ 
work. The conscientious producer will keep his level unnecessarily 
high, while the less scrupulous may get away with a lot of lower- 
quality goods. 

What is necessary is co-operation, and the ideal to aim at is 
possibly one under which most of the testing is carried out by the 
producer, in some agreed routine maimer, subject to checks by a 
standardizing authority, who would grant some form of Certification 
Mark. The possibility of developing such schemes has, I know, been 
under consideration by the British Standards Institution for some 
time.* 

Mr. Bayes has raised a number of specific points regarding the 
applicability of statistical theory to certain aspects of his work; 
e.g questions regarding normality and the consequences of departure 
from normality. We have looked into some of these points in the 
Department of Statistics at University College; the answers are 
rather too technical for the present discussion, but I shall hope to be 
able to add them in print. Indeed, Mr. Bayes has asked such a 
number of questions important to his work, that I hope he will not 
be disappointed if we do not answer more than a few of them to-night! 
Let me say, in warmly seconding the vote of thanks proposed by Mr. 

* See B.S.I. No. 600, 1935. The Application of Statistical Methods to 
Industrial Standardization and Quality Control , pp. 46-51. 



84 Discussion [No. 1, 

Tippett, that I for one hope that this meeting to-night will not be 
just a first and last contact. 

Colonel Hidden said that Mr. Bayes in his paper had made two 
charges against Government Specification No. 5A. That specific¬ 
ation had since been amended, and was now No. 5 of 1935, but did 
not differ in material respects. The first charge was that the 
specification did not state the strength in pounds. On behalf of 
the Government Department responsible for that specification, he 
pleaded guilty to this charge of omission. It had been whispered to 
him that a chit of a girl—a typist—was probably responsible for 
that omission, as on turning up the Woollen and Worsted Specification 
he saw that the word graced the top of the Tensile Strength Column. 
He had been in communication with the Secretary of the Committee, 
and he could assure Mr. Bayes that that omission would be rectified. 

The second charge was that the Specification did not state whether 
the specified strength referred to the mean or to individual strips, 
or whether it was a minimum. At the moment he could only reply 
to this charge on behalf of the War Department as unfortunately the 
Chairman of the Committee responsible for the specification was in 
De^onport, and notice of the meeting had been too short to allow of 
communicating with him, but the Army practice was to regard the 
specified tensile strength figure as a minimum figure, and to take 
samples from the middle of three separate pieces of cloth from a 
delivery and test three-warp way and three-weft way from each 
sample. He did not know if that was the general practice of all 
Service and Government Departments, but Major Myers represented 
the Air Ministry, and would be able to speak about his Service. 

In general, they were against tolerances in specifications; the 
Army bought cotton material in very large quantities; the competition 
was exceedingly keen, and the experience was that some contractors 
were tempted to tender to the lower figures. The matter of toler¬ 
ances was considered by a Standardization Committee on Clothing 
Materials about 1922, called The Interdepartmental Committee on 
the Standardization of Clothing Materials, its members consisting of 
representatives of all the fighting Services, all Government Depart¬ 
ments issuing public clothing, and the Police, and the Committee 
arrived at this conclusion: kt The Committee has considered the 
question of specifying a definite margin*for the strength test, but 
does not advise its inclusion at present, recommending that if the 
omission be found to lead to any difficulty the matter be brought 
before the Technical Co-ordinating Committee on Textiles and 
Clothing set up as a result of the report of the Mond Committee, 
which will come into being now that this Committee has made its 
final report.” 

Colonel Hidden has not been able to trace that the matter had 
ever been up again before that Committee, 

The War Department was also against specifying counts and 
twists, as they rather doubted the wisdom of this, and thought it 
might be a disadvantage to do so in that a maker might be able to 
fulfil the specification's requirements as regards counts and twists. 



1937] 


on Mr. Bayes’s Paper . 


85 


but still fail to attain the required strength and weight, and be 
inclined to blame the specification. They thought it better to leave 
that matter to the discretion of the contractor. Government 
specifications must be elastic. While no two people would agree as 
to what was to be put into a specification, they tried to avoid over¬ 
loading them, and did not wish to tie too much the hands of the cloth 
maker. 

In conclusion, he would like to say, on behalf of the Technical 
Co-ordinating Committee responsible for the specification, that they 
welcomed constructive criticism, and any conclusions that might be 
arrived at by the meeting would receive very careful consideration. 

Major H. Myers thought there was very little for him to add to 
the very clear account that Colonel Hidden had just given of the 
principles underlying the drafting of these Government specifications. 

He himself spoke for the Air Ministry, and as regards the actual 
testing practice of materials, this was exactly as Colonel Hidden had 
described, as also was the method of selecting samples. There was 
one point, however, that he would like to mention, because Mr. 
Bayes had referred specifically to aeroplane linen. This seemed to 
be rather Ct dragged in by the hair ” in a discussion on cotton fabrics, 
but on this particular point he thought Mr. Bayes would agree that 
it differed absolutely and was poles apart from this question of 
commercial cotton cloth. Mr. Bayes was undoubtedly right in 
saying that while the tensile strength of warp and weft in cotton 
cloth might be a sound indication of its quality, it was not the prime 
factor in determining whether the cloth was satisfactory for the 
purpose required; but in aeroplane linen there was no question that 
the tensile strength of warp and weft was the most important factor 
regarded from the safety aspect, and therefore it was not unreason¬ 
able to make that particular characteristic of the cloth a very rigid 
requirement. Although actually the specification in that particular 
instance might be open to Mr. Bayes’s charge that it looked as though 
the cloth could be rejected or accepted on the test on one strip, it 
was not so in practice. The practice was to test six strips each way 
from every piece of fabric, and these strips of warp and weft were cut 
in such a way that the same thread did not occur in any two strips. 
The method of testing that type of fabric was not the same as that 
described in the paper; it was not tested on machines having a 
constant rate of travel, but on machines having a constant rate of 
loading. Unfortunately, so far as his own experience went, there 
was no direct correlation between the results of these two methods of 
testing. He would be extraordinarily appreciative if Mr. Bayes or 
anyone else could throw any light on that particular problem, 
because it was bound to upset any statistical approach to a conclusion 
when there were two or more methods of testing, both in extensive 
use, and bearing no determinable relation to each other. 

Major Myers said that the impression he had arrived at, perhaps 
wrongly, was that the paper appeared to rest on two basic assumptions. 
The first was that there was something approaching absolute uni¬ 
formity in the fundamental properties of the initial cotton structure 



86 


Discussion 


[No. 1, 


from which to start. Was that so? Did not the yarns vary a 
great deal in themselves, and even along the length of one yarn? 
The second was that the method of testing on a Goodbrand machine 
was a very rough-and-ready sort of business; from a laboratory 
point of view, it was almost as bad as it would be for an engineer to 
test a high-grade steel by hanging pound weights on it until it broke. 
He could not help feeling they might be going into very fine discrimina¬ 
tion on the statistical side, when in fact the crudity of the testing 
method in common use entirely masked things ot that kind—in 
other words, there might be misleading results on account of the 
limitations of the testing machine. This point of view was merely 
thrown out as a suggestion. He was concerned with the testing of a 
large variety of materials, of which textiles were one group, and 
often found himself faced by this difficulty, of the inaccuracies in 
the means of testing completely swamping the actual variations in 
the material itself. 

Major Myers said he would like to add his quota of congratulations 
on the paper, in which Mr. Bayes had made an extremely intricate 
subject much clearer than one could have supposed possible. 

Mr. Cochran said Mr. Bayes had drawn attention to the data 
presented in Table II, in which he showed the result of tests on warp 
and weft strength on one hundred samples of cloth taken from stock 
in a single day, and had suggested that from these figures some idea 
might be obtained of the variations which occurred in warp and 
weft strength and in length between pieces of cloth in stock and the 
sources to which that variation was due. There were four classes 
into which variations might be grouped as presented in Table II. 
In the first place, the pieces were divided according to whether they 
were made in October-November 1934, or in January-February 
1935. Then, within each period of production, the pieces were 
taken from different looms. Occasionally—as, for example, with 
looms, 4, 10, 31 and so on—several pieces had been taken from the 
same loom, so that one might estimate from these what sort of 
variation occurred between different pieces of cloth from the same 
loom. Finally, as Mr. Bayes had mentioned, two tests were made on 
each cutting, so that an estimation might be made of how much 
variation occurred between different tests on the same cutting. 

An eye inspection of the Tables would indicate that in warp and 
weft strength certainly the greater part of the variation shown in 
these classes could be attributed to the last category, that between 
different tests on the same cutting. This impression was confirmed 
by an analysis of the variation. Further, Mr. Cochran thought that 
any differences which existed in this particular set in the length, 
warp or weft strength from different looms must have been small, 
because the variation between the looms was in no case significantly 
above that within looms between different cuttings. 

There was one rather anomalous figure in the weft strengths which 
could be seen at once, in loom in, where 130 was given in a column 
of test results which were nearly all above 300 , Perhaps Mr. Bayes 
would mention what had gone wrong there. 



1937] 


on Mr. Bayes’s Paper. 


87 


The standard error of the mean of two tests was 4*4 per cent, for 
warp strength and 6*7 per cent, for weft strength. That was estimated 
from the variations between looms, and the standard error of the 
length of a piece of cloth was about 1-7 per cent. 

Mr. Bayes had mentioned that one might see from these figures 
whether the mean length, warp or weft strength appeared to have 
varied between October/November and January/February production. 
The differences that actually occurred could easily be accounted for 
by variations within each period, so that there was no indication of 
any real change. He had been unable to find any correlation between 
the variations in this short period of the warp and weft strength, 
or any correlation between either of these separately and the length 
of cloth, so that the results indicated that within the range of varia¬ 
tion that had occurred, controlling the length of cloth would not 
improve either warp or weft strength, or reduce their variability. 

Mr. Welch said that he would like to comment shortly on the 
data given in Table I of the paper. These consisted of warp and weft 
strengths for 50 samples of 5 , covering a period of about 50 weeks, 
each sample representing pieces taken on one day from 5 different 
looms. Mr. Bayes had already analysed the results into cc within 
sample ” and “ between sample ” variations, deducing from these 
on p. 75 an estimate, of the true “ between weeks ” variation. 
Although <3 W does give a measure of the true variation that actually 
occurred in 1936, it is not safe to apply this estimate, without further 
investigation, to future years. As is seen in Fig. 2 of the paper, the 
week to week variation cannot be considered random, but exhibits 
definite trends, which do not necessarily characterize the variation 
likely to happen in other years. Further, overall yearly variation of 
this type is not what we are most interested in. We really want to 
detect, by control charts or otherwise, any excessive short period 
changes. The use of a w in this connection would probably lead to 
too wide control limits. 

Fig. 2 suggests that the data of 1936 may be profitably divided 
into four periods of roughly 3 months each. Mr. Welch had carried 
out the analysis of variance and covariance of warp and weft strength, 
appropriate to this further division. The results are presented in 
the accompanying Table IV. It is seen immediately that, for weft 
strength, all the "‘between week” variation can be ascribed to 
variation between the means of the 3 -monthly periods. The amount 
of variation within these periods is quite consistent with the “ within 
sample ” variation. For warp strength, also, the variation from week 
to week in short periods does not appear to be large. The analysis 
also throws some light on the question of correlation between warp 
and weft strength, the possibility of which Mr. Bayes referred to on 
p. 71. Whereas there is no appreciable correlation in the data as a 
whole (line 5 of the table), there is between the weekly means (line 3 ). 
Further breaking up, however, shows this to be due to the fact that 
the warp and weft means of the 3 -monthly periods go up and down 
together. There is no evidence that the shorter-term fluctuations 
are correlated (line 2 ). Similarly, whatever the factors are which 



88 


Discussion 


[No. 1, 


cause the variability between samples taken at one time from different 
looms, they act on the warp and weft strength quite independently 
(cf. the non-significant correlation of — o-ii8 in line 4 ). I 11 short, 
the only clear-cut effects in the 1936 data are the long period ones. 

^ Table IY. 


Analysis of Variance and Comriance for Data of Table L 



Degre* s 
of 

Fiee- 

dom. 

V arp. 

Weft. 

Waip and Weft. 

Source of Variation. 

Sum of 
Squat cs. 

Mean 

Square. 

Sum of 
Squares. 

1 

Moan 

Square, 

Sum of 
Pro¬ 
ducts. 

Oon ela¬ 
tion Co¬ 
efficients, 

1. Between 3-month- 
ly periods 

3 

2503 

834 

947 

316 

1383 


2. Between weeks 
within periods... 

40 

2241 

49 

1127 

24 

74 

0-047 

3. Total between 

weeks . 

49 

4744 

97 

2074 

42 

1457 

0-465 

4. Within samples of 
5 . 

200 

4721 

24 

3908 

20 

- 506 

-0-118 

5. Total variation ... 

249 

9465 

— 

5982 

— 

951 

0-126 


A further point which Mr. Bayes raised (p. 70) was the possibility 
that, for warp strength, there may, at a given time, be less variability 
between those pieces which happen to be the first woven from a 
weaver’s beam than there is between pieces taken at random. In 
Table I samples cut from first pieces are marked with an N. Desig¬ 
nating samples not from first pieces by N , we may analyse the 
variability within samples of 5 into (a) within N' s (b) within N' s, and 
(c) between N and N. Every sample will not necessarily contribute 


Table Y. 

Analysis of Within Samples Variation for Warp Data of Table L 


Source of Variation. 

TVgrns of 
.Freedom. 

KSum of 
bquaros. 

Mean 

Square. 

Within A”s within samples. 

27 

814 

30-2 

Within A T, s within samples. 

Between N and N within samples. 

127 

30 

2656 

783 

20-9 

26-1 

Total within samples of 5. 

184 

4253 

23-1 


to each of these three categories. However, if_we sum over all the 
46 samples for which the distinction into N and N is made, we obtain 
27 degrees of freedom for (a), 127 degrees for (i b ) and 30 degrees for 
(c). The appropriate analysis is given in the accompanying Table V. 
There is here no indication of greater similarity between samples 
from first pieces than between samples taken at random. The reason 
Mr. Bayes gave for believing that the first pieces might have been 






1937} 


on Mr. Bayes's Paper. 


89 


more uniform, was that they were likely at a given time to have come 
from the same set of back beams. However, he notes elsewhere in 
the paper that pieces from the beginnings and ends of beams vary 
more in length than pieces from the middles of beams. Perhaps 
here there are two factors acting in different directions in their 
influence on the variability of warp strength. 

Mn. Daniels thought Professor Pearson raised an interesting 
point when he mentioned control of the variation of strength in a 
piece. As Mr. Bayes had pointed out in his paper, the quality of a 
piece of cloth was influenced by the strength at its weakest places 
rather than by the average strength of the piece, and from this point 
of view it would seem advisable to fix control limits not only for the 
| mean strength, but also for the standard deviation or range of 
strength of a number of test strips from the same piece. 

If this additional information were thought to be valuable, then 
the sampling method of taking one test strip from each piece was not 
adequate. 

Perhaps a less stringent procedure which would be as effective in 
rejecting pieces having an excessive number of weak places would be 
to devise control limits for a single quantity like the sample mean 
minus some constant multiple of the sample standard deviation or 
range; this would be analogous to the “ weakest strip,” but less 
subject to random variation. 

Periodic trends similar to those exhibited in Mr. Bayes’s data had 
been commonly experienced in worsted mills, and could in many cases 
be traced to what amounted to lack of common sense shown by the 
operatives in the mill. To cite a recent case in point, conditioning 
plant had been installed in a mill with the object of maintaining 
constant humidity during spinning. At 9 a.m., not long after the 
conditioning plant had been turned on, the yarn count was tested, and 
change wheels were altered to bring it to specification. At 11 a.m., 
when the humidity had attained equilibrium, the count was again 
tested, and the wheels altered again to bring the count back l This 
undue readiness to alter wheels was felt to contribute largely to 
periodic trends and consequent lack of control, at any rate in worsted 
mills. 

Mr. G-osset wished to say a word for the control chart. It had 
been talked about as a sort of wall ornament, but in point of fact it 
was a very useful thing. He had had control charts in the laboratory 
which had led up to nearly halving a laboratory error, because they 
gave a hint as to what to look for. 

And in this discussion, although the method of testing the 
strength had been aspersed, it was clear from the control chart that 
the method was good enough to show secular changes, unless indeed, 
as was unlikely, the secular changes were due to the testing machine 
itself. 

The Chairman said that the minutes of the last meeting which 
had been read at the commencement of the proceedings, had ended 



90 


Discussion 


[No. 1, 


by saying that the discussion was wound up by the Chairman. That, 
no doubt, was one very useful function of the occupant of the Chair. 
Another was that of playing the part of the more or less intelligent 
listener; and frankly that was all that he himself felt capable of 
doing. When he received a copy of the paper, he had, in accordance 
with his usual habit, read the introduction and the conclusion first, 
and thought he understood the problem propounded. On then 
reading through the whole paper he rather lost himself, in the 
second part, though gradually he found the track; but when he got 
beyond that he was soon very much out of his depth. 

After having heard the paper read, however, he would like to 
express his agreement with Major Myers who had said that the author 
of the paper had succeeded in making the problemmuch clearer thanhe 
had at first thought it could be made. If the function of the reader of 
a paper in this Section was to throw down a problem and set out the 
difficulties, giving his hearers something to think about very seriously, 
not only at the meeting but afterwards, Mr. Bayes appeared, accord¬ 
ing to the discussion, to have succeeded admirably. He gathered 
that Mr. Bayes would not be disappointed at not getting, that 
afternoon, any very definite answers to the questions he had posed. 

The Chairman had great pleasure in putting to the meeting the 
hearty vote of thanks moved by Mr. Tippett and seconded by 
Professor Pearson; in asking Mr. Bayes to reply, he reminded him 
that he would have the opportunity of expanding his remarks in the 
Supplement . 

The vote of thanks was then carried unanimously. 

Mr. Bayes's reply was as follows: 

I must thank you all for the kind way in which you have listened 
and commented, and I should like especially to thank Mr. Tippett 
and Professor Pearson for their essential help in the preparation of 
this paper; but for their efficient goading it would certainly not have 
appeared in its present form. I am very much indebted to them. 

The discussion has been full and very helpful to me. Concerning 
what is done in the mill about the lack of control shown by control 
charts; special tests of yarn counts and threads per inch are made on 
samples giving extreme results on the strength test charts, but ex¬ 
tremes in other charts do not usually give rise to special investiga¬ 
tions. So far the main service of my charts has been to show how 
restricted is the use made of the abundant statistics in the mill. 

Colonel Hidden has given us information essential for the use of 
the Government Specification No. 5; I should like to suggest that it 
should be included in that specification at the next revision. The 
standard error of the mean resulting from the method of sampling he 
quotes, which may for convenience be called “ War Office ” sampling, 
may be calculated from the within cutting and between cutting stand¬ 
ard deviations, as was done for the curves and “ Testing House ” 
points in Fig. 3. Taking the figures used for the warp curve in Fig. 
3—namely, a* = 2*58 per cent., and <? r == 3*18 per cent.—and keeping 



1937] 


on Mr. Bayes’s Paper. 


91 


the same notation as before, tie following standard errors are 
obtained witi tie various sampling methods:— 


Sampling Methed. 

Total Number 
of Teats. 
m x «. 

Number of 
Cuttings. 
m. 

Tests per 
Cutting 
n. 

S.B.* 
per cent. 

“ War Office ” . 

9 

3 

3 

1-828 

“ Testing House ”. 

6 

3 

2 

1-976 

Extended “ Testing House 55 

8 

4 

2 

1-711 

“ Method 2 ” (of Table III) 

5 

. 5 

1 | 

1-831 

Extended “ method 2 ’* ... 

9 

9 

I 

1-365 


From these figures it is clear that “ Method 2 99 sampling, requiring 
only 5 tests, gives as accurate an estimate of the mean strength of 
the delivery as the “ War Office 99 sampling requiring 9 tests. The 
equivalence of the results of these two methods of sampling depends 
upon the size of o> and a r . I 11 my experience of loom state cotton 
fabrics <s s and a r are usually about equal, so I should expect “ Method 
2 95 sampling on the average to give slightly more accurate estimates 
of the means than the “ War Office 99 sampling. It is not true to say 
that “ Method 2 99 is therefore suitable for all conditions and require¬ 
ments. If the whole delivery of cloth is accepted or rejected on the 
mean strength tested, the criterion of a good sampling technique is 
that it should give the most accurate estimate of the mean with the 
least labour of sampling and testing, but, supposing that only the 
pieces giving the weaker tests are rejected, or supposing, as Mr. 
Daniels suggested, that the within piece variation is an important 
characteristic of the cloth, more than one test should be made on each 
cutting. In practice a delivery of cloth is usually accepted or rejected 
as a whole. I do not wish to appear dogmatic on the subject of this 
“ Method 2 99 sampling, but I suggest that the method of sampling 
now used for cloth-strength testing might with advantage be recon¬ 
sidered. The use of minimum strengths in specifications should be 
considered at the same time. Even with aeroplane linen one cannot 
be certain that there is no cloth in a piece weaker than the weakest 
test strip broken, unless the whole of the piece is tested, and then no 
cloth is left for the aeroplane. As one must, in practice, judge the 
quality of a piece of cloth from a small sample, the specification 
and the sampling method should be designed to give the most accur¬ 
ate estimate of the probable lowest strength of the piece , which is not 
necessarily obtained by considering the strength of the weakest strip 
tested. A better estimate of the lowest strength of the piece would 
probably be obtained from the mean and standard deviation of the 
tests. 

To the two machines for cloth-strength testing mentioned by 
Major Myers—namely, the Goodbrand constant rate of traverse 
machine and the Avery constant rate of loading machine—one 
should add the American “ grab 99 tester, which worries the Lanca¬ 
shire manufacturer, if not the Air Ministry, and which gives results 
bearing no simple relation to either the Goodbrand or Avery tests. 
There are many parallel examples in other industries. In tests of 










92 


Discitxsion 


[No. 1, 


hardness of metals, for instance, the Brincll number and the Shore 
Seleroscope number do not always rank specimens in the same order 
of hardness. The figures quoted above for within and between 
cutting variations of one type of cotton doth cannot be applied 
directly to a linen cloth tested on a different type of machine, but it 
is certain that a real variation from piece to piece found by the Good- 
brand test will be found by the Avery test, and will occur in linen 
cloth as in cotton, and, of course, the same statistical approach will 
serve for both. 

In further discussion of the points raised by Major Myers, I should 
modify his first basic consumption and quarrel with his second. 
There is no 44 absolute uniformity ” of anything in cloth except of the 
glucose residues in the molecular chains of cellulose, but the mean 
value and variability of each feature of the cloth are moderately 
stable. My first basic consumption is, therefore, that the variation 
found in cloth-strength tests is characteristic of cloth and is suffi¬ 
ciently stable for measurements of the variation to bo useful. My 
second assumption is that the Goodbrand machine measures the 
strength of individual test strips consistently. There are technical 
difficulties in the use of the Goodbrand machine, as in the use of most 
apparatus, and some of the details are arbitrary—the rate of traverse 
and the dimensions of the pendulum quadrant, for instance—but 
successive calibrations agree to within small limits. All pendulum 
and spring strength-testing machines are calibrated with weights, 
and the Avery machine in use hangs the weights up itself, by rolling 
lead shot into a tin. If the variation of results is to be taken as a 
measure of the accuracy of the testing machine, I would remind Major 
Myers that the Fabrics Co-ordinating Research Committee 4 have 
reported test figures in which the Avery machine gave more irregular 
results than the Goodbrand.* 


Testing Machine. 

Goodin ami. 

Avery. 

Mean warp strength, lb. . 

454*075 

420*405 

Within piece standard deviation, lb. 

17-,‘tt 

20*25 

Within piece coefficient of variation per cent. 

3*82 

4*75 


In reply to Mr. Cochran the 130 -lb. result from loom xn caused 
considerable anxiety at the time, but there was clearly nothing wrong 
with the machine, and no indication on the test strip that there had 
been any slipping in the jaws, or that one edge of the strip had been 
tighter than the other. Six of the results quoted for weft strength 
and one or two of the warp tests are below the main body of results. 
Some workers discard such figures, but low figures occur so generally 
that they must be regarded as a real feature of the Goodbrand test, 
and though this 130 -lb. value was the lowest I have ever encountered, 

* The cloth used for this test was made from specially regular yarn; far 
more regular than the yam in the cloth described m Table T, but, on the other 
hand, fewer threads were tested in each strip, and the arrangement of the 
threads was not necessarily any better than in the cloth of Table I. 






table 111 « Method 1 ” sampling 18 5 32 to 5 11 34 and up to 15 2 33, “ Method 2 ” sampling 23 2 35 to 30 10 36 and aftci 

L stajs changed to 3+ inches and test figuies mcicased propoitionallc foi iilotting 


1937] 


on Mr . Bayes’s Paper. 


93 


there seemed to be no valid reason for rejecting it. The problem 
of how these results should be treated in routine testing remains. 

Mr. Welch suggested that the overall yearly variation shown in 
Fig. 2 is not of very great interest, but I can assure him that when the 
strength falls and stays low for weeks on end, and there seems to be 
. no reason for it, the customer may become very interested indeed. 
Fig. 5 shows the warp strengths for the five years covered by the two 
' periods in Table III, and Table VI gives an analysis of variance of 
the first period 18.5.32 to 5.11.34, method 1 sampling : 


Table VI 


- 

Degree s of 
Errcdom. 

Sum of Squares. 

Mean Square. 

1. Between 3-monthly periods ... 

9 

1266 

141 

2. Between weeks within periods 

103 

3406 

33 

3. Total between weeks. 

112 

4672 

42 

4. Within samples of 5. 

452 

2008 

4 

5. Total variation . 

564 

6680 



As with the figures analysed by Mr. Welch, much of the “ between 
weeks ” variation can be ascribed to variation between the means of 
the 3 -monthly periods. From the practical point of view, the 3- 
monthly period variation appears to be the most important to 
investigate. It may be, as Mr. Daniels suggests, that inadequate 
allowance is made for variations in atmospheric humidity when the 
material is tested for count during spinning, but it is very doubtful 
if more than half of the variation shown in the 3 -monthly means 
can be ascribed to this cause. The explanation of these variations 
lies so entangled in the whole process of cloth manufacture that only 
a part of it can be codified from the results of one type of test, but this 
discussion has very greatly helped to make plain what is to be 
expected of the strength test and from what fields further explana¬ 
tions are likely to grow. 








94 


[No. 1, 


Notes on some Statistical Problems Raised in Mr. Bayes’s 

Paper. 

By E. S. Pearson and B. L. Welch. 

Mr. Bayes lias pointed out that the distribution of cotton cloth 
strength is probably not always represented by the normal probability 
curve. He has then asked : 

(a) How far departure from normality may be detected when 

the data are in the form of a few tests made on each 
of a number of different pieces of cloth 1 

(b) Whether skewness may make it inadvisable to specify a 

minimum strength for single test specimens ? 

It has seemed to us worth while examining these points carefully, 
since similar questions are likely to arise in other cases where the 
applicability of routine methods of control are under consideration. 

1. Variation among Tests from same Piece. 

The distribution of such tests is thought generally to bo skew; 
Mr. Bayes gives the distribution of xooo tests from one piece of 
duck cloth collected in connection with another investigation, and 
we shall examine on these data the adequacy of some of the practical 
statistical methods, derived on the assumption of normality, which 
he uses or might use. The frequency distribution is as follows : 


Table I. 

Warp Strength in Buck Cloth . 


Strength m lb. 

Frequency. 

Strength in lb. 

Frequency. 

(Cential Values). 



(Conti ul Values). 

Observed. 

Graduation. 

400 or less 

2 

26 

455 

113 

106*6 

405 

4 

3-8 

460 

125 

109*5 

410 

4 

7*4 

465 

105 

105*5 

415 

17 

13-0 

470 

97 

93*6 

420 

27 

20-7 

475 

73 

74*2 

425 

27 

30*8 

480 * 

38 

49*2 

430 

50 

43*1 

485 

19 

23*1 

435 

49 

57-0 

490 

10 

4*2 

440 

445 

450 

68 

76 

95 

71-7 

85-9 

98-1 

495 

1 

0*0 

Total 

1000 

1000 






1937] Statistical Problems Raised in Mr. Bayes*s Payer. 


% 


The moment coefficients of this distribution were found to be * 

Mean = 454-09 lb. Standard deviation = 17-3577 lb. 

Vb x = - 0-421 b 2 = 2-748 

The value of Vb 1 differs significantly from the normal value 
= 0 ; &2 is ° n the 5 per cent, level with regard to the normal 
value, fJ 2 = 3.f We may ask however : 

(а) How far is the distribution of range in small samples, say of 
n = 5 , from a population following this distribution the same as that 
from a normal population? In particular, if we estimate the 
population standard deviation from moan range in samples of 5 
dividing by Tippett’s corrective factor, 2 - 326 , shall we be much in 
error? Or, again, if in a control chart for variation within pieces 
we use range and the tabled limits for normal theory, ((3), page 86 ), 
will these control limits be wrongly placed ? 

( б ) Had a long series of tests from one piece not been available, 
but only 5 tests from each of a number of pieces, would it have been 
possible to test for departure from normality ? 

(c) Basing the procedure on a knowledge of the standard deviation 
of strength within pieces (a = 17-36), and using the method suggested 
by Tippett (4) in the case of normal variation, it would be possible 
to determine the probability that the weakest strip in 5 taken from 
a piece should have a strength any given amount below the true 
average strength for the piece. How far would the observed skew¬ 
ness in the strength distribution bias the results? 

The following investigations have been carried out with the help 
of students in the Department of Statistics at University College, 
London. 

(a) Range . Using Tippett’s Random Numbers (5), 200 samples 
of 5 were randomly drawn from an infinite population, having 
proportions in the groups exactly as for the observed distribution 
of 1000 given above. The following table compares the mean and 
standard deviation of range, and the number of samples with range 
outside specified limits, (i) on normal theory, (ii) in experimental 
sampling from the skew population. 

The four control limits are at 0 * 700 , 1 * 040 , 3*870 and 4 - 590 , 
respectively ((3), page 86 ). The differences between theory and 
experiment are clearly not significant, and suggest that no serious 
error will be involved in using tests based on range, when dealing 

* Sheppard’s Corrections were used for the second and fourth moment 
coefficients. Jf m if m z and m 4 are the second, third and fourth moments about 
the mean = m 3 /w 2 -, b % — 

f Tables of 5 per cent, and X per cent, significance lovela have been given, 
(1), page 248 and (2), page 224. Hero and elsewhere the numbers in brackets 
refer to list of references given at the end of this paper. 



96 Pearson and Welch— Notes on some [No* 1, 


with tlie variation in strength within pieces, at any rate if the dis¬ 
tribution is no more skew than that given. These results may be 

Table II. 





Frequency (in 200 Samples) 
beyond: 


Mean Range. 

S.D. 0 £ Range. 

Lower. 

Upper. 




1 % 

limit. 

5% 

limit. 

5% 

limit. 

1% 

limit, 

Normal Theory 

2-326<r=40-37 

0-864a= 15-00 

2 

10 

10 

2 

Experiment ... 

40-42 

14-61 

3 

9 

7 

i 


compared with others previously reported ((2), pages 167-168) 
and ((3), pages 95-96)* 

(b) Test of Normality Based on many Small Samples. If 5 tests 
are made on each of a number of different pieces, these tests cannot 
be pooled together when examining the form of distribution of strength 
within pieces, owing to the significant between-piece variation. 
We can, however, calculate \/b 1 for each sample, and study the re¬ 
sulting distribution. This was done for the 200 random samples 
used above in investigating range; the results are shown below; 
VbjVb having been tabled for convenience. 

Table III. 


Distribute of VbfVb in 200 Samples of 5. 


Central 

Values. 

—65 

—55 

—45 

—35 

—25 

—15 

—05 


Frequency 

3 

9 

13-5 

20*5 

18 

38 

24-5 


Central 

Values. 

+*0o 

+•15 

+•25 

+ •35 

+ •45 

1 

+ •55 

[-05 

Total 

Frequency 

19 



B 

5 

6*5 

3 

1 

200 


For normal theory 

Mean Vb, = 0, H 8) = Vi = 0-612 

but the precise form of the probability distribution is unknown. 
That it will be of somewhat unusual shape is suggested by the 
anomalous forms found for samples of size 3 and 4 ( 6 ), (7). The 
distribution must, however, be symmetrical, and it is evident 
from Table II that the observed distribution with 126*5 negative 










1937] Statistical Problems Raised in Mr. Bayes's Paper. 97 

and 73*5 positive values of Vb x is definitely asymmetrical. Clearly, 
therefore, with 200 samples of 5 we could have told that the within- 
piece distribution of strength was negatively skew. The mean 
V&i for the samples is — 0 - 168 ,* and since the standard error of a 
mean of 200 values is, on normal theory, = 0*043, this 

differs significantly from Vh = 0 . If, however, less than, say, 50 
samples of 5 tests had been available, the skewness in the strength 
distribution might have been more difficult to detect. 

(c) Distribution of Minimum Value in Small Samples. If 
p(x) is the elementary probability law for a random variable, x, and 
P l ^P(x<x 1 ) t the integral probability that x<x u then the 
probability that at least one value of x in a sample oi n will fall 
below ^ is 1 — (1 — Pj) n . 

Suppose the consumer fixes a minimum strength, x l} and proposes 
to judge whether any part of the cloth has a strength below this 
minimum by examining whether any of 5 strips cut from a single 
piece of cloth have strength below x L ; it is easy to illustrate on the 
present data how inadequate his test will be. The distribution of 
Table I was graduated by a Pearson Type I curve having the following 
equation: 

y = (21-9662)(x - 370*02) 51218 (495*59 - x ) 1 « 

This curve was fitted by the method of moments, using the 
moments given on p. 95 above. As seen from the column headed 
“ graduation ” in Table I, the fit is satisfactory except at the abrupt 
end.]’ As we are concerned below with the graduation at the lower 
end of the distribution, it was unnecessary to attempt any adjust¬ 
ment to the fit. 

In Table IV below, row (I) gives the number of strips per t housa ml 
(taken from this graduating Typo l curve) below the limits sot at 

Tablm IV. 



Limit in lb. 0r t ). 

410 5 

117 5. 

133 5. 

137-5. 

133 5 

I L'-5. 

163 5. 

(1) 

Strips per 1000 below limit 

13 b 

10 8 

17-5 

78 3 

131 I | 

260*1 

18 i 1 

(2) 

Probability that at (Tjp<‘ IKO 

*007 

•137 

•3 Hi 

•.136 

171) 

•7lt:i 

•1)18 

(3) 

least one strip m 5-j 

below limit. (Noimulp(r) 

•Oil 

•087 

•101 

•371) 

•J33 

♦704 

•051 


the head of each column. Row ( 2 ) shows the probability that 
at least one strip out of n = 5 sampled from a single piece will have 

* The value of Vb t in samples of 5 would ho expected, on theoretical 
grounds, to be smaller on the avorage than its value, -0-421, for the wholo 
distribution oi 1000. 

f If the 3 frequency groups centred at 485, 400 and 405 lb, respectively are 
combined into one group and those centred at 400 and 405 into another, the 
value of x i baaed on 17 groups is 14-45, and for / 17 - 5 ~ 12 degrees of 

freedom, the probability of obtaining a worse fit through chance fluctuations 
equals 0-27, 





98 


Pearson and Welch —Notes on some 


[No. 1, 


a strength below the corresponding minimum limit. Row (3) 
gives the same probability, assuming that the distribution of strength 
were normal with the observed mean of 454*09 lb. and standard 
deviation 17 - 36 .* 

In the first place, comparing rows (2) and (3), it is seen that the 
skewness of the strength distribution does make a considerable* 
modification in the probability of detecting a weak strip; that is 
to say, this probability measure would be somewhat in error if 
calculated from normal theory, using the mean and standard 
deviation only. More important, however, than this is the warning 
which can be drawn from a study of the figures in rows ( 1 ) and ( 2 ). 

For a test to be of any practical value in detecting whether 
parts of a piece of cloth are weaker than a specified fixed limit, it 
is essential that it can be almost always relied on to draw attention 
to failure to satisfy specification. Yet if the minimum were fixed— 
for example, at x 1 = 432-5 lb.—there would be only an even chance 
that a test on 5 strips would show a break at tension below x l3 
although if the whole cloth had been cut up and tested, 12 per cent, 
of the strips would have been below the minimum. We could be 
reasonably sure (odds of 942 to 58 or 16 to x) that 5 sample strips of 
this cloth would contain at least 1 with strength below — 452-5 
lb. but then nearly half of the material (434 strips out of 1000 ) has 
strength below this level. 

The moral is, of course, this : if it is important to avoid accepting 
cloth with strength below x v the limit which must be passed by the 
weakest strip out of 5 strips cut from a single piece should be fixed 
at x 2 , well above x v 

For example, suppose the situation were as follows : (i) the shape 
and standard deviation of the distribution of strength wore as in 
Table I, (ii) the mean strength were, however, uncertain, (iii) cloth 
for which 5 per cent, or more of the strips liave a strength below 
a*! = 422-5 lb. were regarded by the consumer as unsatisfactory. 
Then the consumer might adopt the following rule : pass material 
if no strips out of a sample of 5 break at a tension of less than ~~ 
452-5 lb.; otherwise make a more extensive test on more strips. 

It is seen from Table IY, (i) that the duck cloth of Table 1, 
with a mean strength of 454-1 lb., is just satisfactory (from the 
graduation curve, only 4-7 per cent, of its 1000 tested strips were of 
strength below the limit specified); (ii) for cloth of lower strength 
the risk of passing the test described is very small, i.e., at most 
about 0 * 05 . 

It is clear, however, that the more extensive test will often be 

* Values obtained by interpolation in Tippett’s table reproduced on mm 
162 of (2). 



99 


1937] Statistical Pioblem Raimi in Mr. Bayes’s Paper. 

called for when the oloili m as a whole satisfactory, although the 
chance of this will decrease as the average strength increases, li, 
for example, the average is 454*1 + 25*0 = 479*111)., or 20*0 lb. above 
the limit £r 2 , the chance of failing to pass the test is still as large as 
V)*335.* If the distribution of strength had been normal with the 
same standard deviation, the position would have been slightly 
better, since the chance of unnecessary further testing would have been 
only 0 * 279 ; in this sense negative skewness is disadvantageous. 

Since, however, the form of the test suggested depends in 
any case on a knowledge ol the standard deviation between strips, 
one may ask whether the use of the weakest strip in 5 has any ad¬ 
vantage over the use of the mean of 5 . For instance, an alternative 
to the above rule would be : pass if the mean strength of 5 strips 
is above 

= 454*1 + 1*64 x 17*36/V5 = 451*1 -f 1 *04 > 7*76 ^ 466*8 lb.f ; 
otherwise carry out more extensive testing. 

For cloth having a lower mean strength than the duck cloth of 
Table I, the risk of passing this second test is, as before, 0*05 or less. 
But now for a satisfactory cloth, with mean at 47 c)** lb. since 
(466*8 — 479*l)/7*76 = —1*58, the chance of unnecessary further 
testing is only 0 * 06 , i.e. considerably less than when using the 
weakest strip as criterion. 

It is seen, therefore, that a test on 5 strips from a piece (which 
in any case provides definite information about only that piece 
itself), (i) must leave a wide belt of uncertainty in any attempt 
to discriminate between satisfactory and unsatisfactory material, 
(ii) will bo more efficient if the mean is used rather than the weakest 
strip in 5 . If the standard deviation of strength is uncertain as 
well as the mean, the problem becomes more complicated, but it is 
quite clear in this case that no test applied to only 5 strips will give 
an adequate check-up on quality. In fact, there can be little doubt 
that now the most economic procedure would be to develop a system 
of guarantee specification of the type referred to by one of us on 
p. 83 of the discussion above. 

2. Variation among Tests from Different Pieces , Completed on 
the Same Day. 

Mr. Bayes’s Table I gives a large number of sets of 5 tests made 
on strips cut from different pieces on the same day. The variation 

* Table IV shows that this is tho chance that the weakest strip in 5 falls at 
a distance of 454*1 — 427*5 ~ 2(5*6 lb. or more below tho moan. 

t For the distribution of Table I with 0*177, tho sampling distribution of 
the mean of samples of 5 will havo a coefficient of J X 0*177 0*035 and here 

for practical purposes may be regarded as normally distributed. Thus 5 per 
cent, of sample means will exceed tho population moan by more than 1-04 X 
standard error of moan. 



100 


Pearson and Welch— Notes on some 


[No. 1, 


witliin these samples of 5 will include both that within pieces (be¬ 
lieved to be skew) and that between pieces. The superposition of 
these two contributions appears to give approximately normal 
variation, as the following analysis will show. 

For each of the 50 samples of 5 tests of warp strength given in 
Table I, we have calculated : 

(i) Vb v defined above, the criterion of skewness. 

(ii) a = (Mean deviation)/(Standard deviation) == 

—- #|/VwS(sc — x) 2 

a criterion which R. C. Geary ( 8 ) has suggested may 
be more useful than b 2 in detecting in small samples 
a departure of the population p 2 from 3. 

For normal theory in samples of 5 : 

(i) Mean V&i = 0, cr^ = 0-612. For the 50 samples, we 

find: Mean = 0*031 ± 0*087; Standard deviation 
= 0*648 i 0*054; 24 positive and 26 negative values. 

(ii) Mean {a) = 0*8385; a a = 0*0687. For the 50 samples, 

we find : Mean = 0*8429 ± 0*0097; Standard deviation 
= 0*0674. 

The figures after the ± sign are the standard errors calculated on 
the assumption that strength is normally distributed. Thus the 
standard error of the mean of 50 values of Vb 1 is 0-612/V50, while 
the standard error of the standard deviation is approximately 

K/.-'vAmVQ - l)/50* 

It is clear that neither the moan nor the standard deviation of 
the 50 values of Vb t and a differ significantly from what we should 
expect were warp strength among strips from different pieces 
normally distributed. 

3. Conclusions. 

If the within-piece variation in strength is as skew as for the 
duck cloth given in Table I: 

(a) A statistical test is available which might detect departure 
from normality if tests on 5 strips from each of not less than 50 
pieces were available; it certainly does so for 200 pieces. 

(b) Little error would, however, be involved in using rough 
tests based on range in small samples, either to estimate the within- 
piece standard deviation or to form control charts. 

(c) The distribution of the strength of weakest strip in small 

* See for example, (9) page 294. ^(Vbi) for samples of n 5 can bo 
obtained from the modification of R. A. Eisher’s result (6) given in equation 
(3) of (10), and equals 2*5714. 



1937] Statistical Problems Raised in Mr. Bayes’s Paper. 101 


samples will be modified from the normal theory form in a direction 
which somewhat decreases the efficiency of the test. In any case, 
however, a more efficient tost could be based on the mean of the 
sample. 

There is no evidence, on the data supplied by Mr. Bayes, that 
the variation in strength among tests from different pieces completed 
on the same day departs seriously from the normal. For this and 
other reasons suggested by Mr. Bayes it would seem that the most 
satisfactory method of routine sampling would be that which ho 
described as method 2 sampling (p. 75); i e. takiug one strip from 
each of 5 pieces per week. If the variation within weeks is found 
to become too largo, it would then be always possible to examine 
the within-piece variation in order to determine the sourco of trouble. 


References. 

(1) K 8. Pearson. Hiometida , 1930, XXII, pp. 239-249. 

(2) Tables jar Staiistuiam and Biomthiuuns, 1931, I’mtll. 

(3) £ S. Pearson and J. Haines. I.A.R.8. Supplement to Journal of Royal 

Statistical Society, 1935, U, pp 83-98. 

(4) L. H. C. Tippett. Biomet)Ha, 1925, XVII, pp. 304-387. 

(5) L. H. V. Tippett. Tracts for Computers, 1927, No. XV. 

(6) R. A. Fisher. Pioc. Roy. 80 c, 1930, A. OXXX, pp. 16-28. 

(7) A. T. McKay. Biomctrilca, 1933, XXV, pp. 204-210. 

(8) R. O. Geary. Ibid.. 1936, XXVIII, pp. 295-307. 

(9) B. S. Pearson. Ibid., 1929, XXI, pp. 294-302. 

(10) E. S. Pearson. Ibid., 1931, XXII, pp. 423-424. 



102 


[No. 1, 


Problems Arising in the Analysis of a Series of Similar 

Experiments. 

By W. G. Cochran, B.A. 

§ 1. Introduction . 

An efficient type of modern field experiment is that in which a 
replicated trial is laid down in the same year at a number of centres, 
or carried out at the same centre independently throughout a number 
of years. The statistical problems which arise in the interpretation 
of the results of such a set of data are of wide generality. For any 
treatment effect, we obtain at each centre an estimate x and an estim¬ 
ate s of its standard error, based on n degrees of freedom. As a 
preliminary to more detailed examination, the experimenter wants 
to estimate and test the significance of the mean treatment effect 
and to find whether it has varied from centre to centre. 

If the individual experiments are assumed to be equally accurate 
and the estimated standard errors do not contradict this assumption, 
the statistical treatment is easy and familiar. If there are k centres, 
the data for any particular treatment response may be analysed into 

B.F. 

Mean response . 1 

Interaction of response with centres.(& — l) 

. Local experimental error . ilk 

The interaction of the treatment response may be tested against 
the combined experimental errors, and the mean response may bo 
tested against either the interaction or the experimental error. 
The inter])rotation of these two tests has been discussed by Fisher (1), 

It will, however, be the exception to find the individual experi¬ 
ments all of the same precision. The object of this paper is to dis¬ 
cuss the estimation of the mean response and the test of significance 
of the interaction of the response with centres, where we do not wish 
to assume that the standard errors are all equal. 

§ 2. The Equations of Estimation . 

The most general hypothesis to be considered is that the treat¬ 
ment response at any centre is the sum of two parts, each normally 
and independently distributed; one, representing the contribution of 
local experimental errors, varies about zero with standard deviation 
cTj, while the other, which represents the responsiveness of the centre 
to the treatment, varies about a general mean p. with standard devi¬ 
ation a. The parameters p and a are the same for all centres, but 



1937 ] 


Analysis of a fie ties of fit nnlar Etpeiimenis. 


103 


c l vanes from centre to centre. An estimate s t of based on n 
degrees of freedom, is available fioin the local analysts oi variance. 

For a single centre, the joint sampling distribution of s t and .r, 
may bo written, apart from the constant of integration 


s, n - 1 

a,"vV H- c » 2 



h 


(£i - /O' \ 
o 8 + af J 


djc t ds t 


Hence the logarithm of the likelihood for all centres is 


L = — nS log a, - \S log (a 2 |- a, 2 ) - 18 j ^ 

the sum being taken over all A centres. 

The equations of estimation ot g, a l3 a are 

dL _ nr (**’l g) _ 

5 pL -° a 2 +c? 2 “- U . 


0 ) 


8 L 


V _ *4 , 

<Jj CT 2 ]- QTj 2 CTi 3 


«i(»i- M -) 2 
(c 2 + a , 2 ) 2 


-- 0 . 


( 2 ) 


dL _ ~ cr 

~8 g a 2 + a 4 2 




( a2 h <* 2 ) 2 


o. 


(3) 


The equations are complicated and have no simple general solu¬ 
tion. The complication/is mainly due to the fact that the value of 
x % provides information about all three parameters g, a, and cr*. 


§ 3. The Estimation of the Response when it v ? Assumed Constant at 

all Centres. 

If we assume a = 0, the equations ol estimation of g and a t give 


to *■!*)_ () . 
«»“ 

_ 1 (r, n ) 2 

» I 1 


(4) 

(B) 


Thus the equation of estimation of the mean, response is 


S— 

«, 2 + 


— _ 

(r. - I*) 2 


>/ 


= 0 . 


• ( 6 ) 


If the values of c, were known exactly, the sufficient estimate of 
the mean would be given by the solution of 

S -= °.(7) 

i,e, by the weighted mean. Where the a L have also to be estimated, 
we do not simply replace a ( by s t in equation (7) to obtain an efficient 
estimate, but make use of the extra information about a l contained in 




104 


Oochkan —Problems Aiising in the 


[No. 1, 


x r Equation ( 6 ) may be solved fairly quickly by successive approxi¬ 
mation, starting with the unweighted mean as a first estimate. 

To find the amount of information in the estimated mean and 
perforin tests of significance,* it is necessary to calculate the sampling 
variance of the solution, y say, of equation ( 6 ). For a given set of a, 
and jjl, this would at first sight appear to depend on all these unknown 
parameters, but Bartlett ( 2 ) has shown how to use the information 
available about the unknown a t to obtain the sampling variance of 
ft in terms of the single unknown g. In the joint distribution of x 


and s at any centre 


la" & dxds 


he writes S — ns 2 + {x — ^) 2 and substitutes for 5 in terms of S. 
This gives for the joint distribution of x and S 

(7 £ 2 {1 - (x - ^) 2 /S}. . (9) 

It will be noted that the distribution of x for fixed £ depends only 
on the unknown mean y. Thus the variance of fx for a fixed £, 
which is known if g. is known, will depend only on the unknown mean 
y, and not on the 

To find the variance of ft, write 

■ • • < 10 > 

Now 

0«X(A)-Z(ri + (fr-|i)|?+. . . (11) 

Thus if the number of centres is large, 

B(X^) = °W( 8 d f) .(12) 


From equation (9), the variance of x for fixed £ is £/u | 1 . Thus 



E(X 2 ) = {r v / )2 } = 

1 

(n -f 1) 

*(j) ■ • 

(13) 

and 





'(SM'W-I)}- 



(14) 

so that 





2 + 1) /a 
^~(»- l)* 1/S 

©• 

. . . . 

(15) 

The 

average amount of information for 

a fixed 

set of a 4 is 

e(~. 

\G 1 

and 

is easily found to be 





(n^l) fl\ 

(w + 1) w> 

• * 

. . . . 

(16) 



1937] Analysis of a Series of Similar Experiments, 


105 


Thus the average fraction of information lost through the inaccuracy 
2 

of the weights is 

An alternative which suggests itself to the use of the maximum 
likelihood solution is the weighted mean 

^ = .(17) 

This has the advantage that it can he calculated directly and is a 
familiar type of mean in statistical work, and it is worth while 
estimating its efficiency. .For a fixed set of s % 

-* (£)/«■(?) ■ • • ■ < 18 > 


so that a knowledge of the s t does not, in this case, enable us io dis¬ 
pense with the knowledge of the a ti though it lias the advantage 
that for fixed s t , (jl ? „ is normally distributed. Since 


E 



—~~~cy "2 and E 
w — 2 cr 



(n - 2 )(n - 


1 

4) a* 


the average variance of p* for a fixed set of is, provided n > 4, 


(MXi) 


(19) 


Thus the amount of information in the weighted mean is 

Cr“9 s ©.<»> 

Comparison with (16) shows that the superior efficiency of the maxi¬ 
mum likelihood solution is equivalent to having 3 extra degrees of 
freedom in the estimates 

For n = 4, expression (19) gives an infinite value for the variance 
of the weighted mean g**. This is not correct, since (18) shows that 
for a fixed set of a t the variance of cannot exceed the greatest 
of the variances <r t 2 , i.e. the weighted mean cannot do worse than give 
as the estimate of p the most variable single value or v For n = 4, 
the average variance of can be shown to be 

.<»> 

This value lies between the most and least accurate of the individual 
estimates at the various centres. What is happening is that indi¬ 
vidual low values of s t are turning up so frequently that usually 
all the information about g is being derived from a single centre. 
Thus the percentage information retained tends to aero as the number 
of centres increases. 



106 Cochran — Problems Arising in the [No. 1, 


The percentage efficiencies of £ and g are shown for small values 
of g in the table below. 

Table I. 

Percentage Efficiencies of (l and (x w . 

8 10 15 

78 82 88 

67 75 85 

For n greater than 15 there in little to choose between the two 
estimates, but for n less than 10 the increase in efficiency of the 
maximum likelihood solution is worth the extra labour. For 
values of n between 2 and 6 a good deal of information is being lost 
in the process of estimation of the weights. As these cases may be 
of practical importance (< e.g . n = 2 might represent a set of 3 X 3 
Latin squares), it is worth considering the relative efficiency of 
two other types of mean which suggest themselves. 

One is the unweighted mean. This always retains a finite 
fraction of the information, the fraction decreasing as the true accur¬ 
acies of the individual experiments diverge, and is not subject to 
any loss due to estimation of weights. The other method is to fix 
arbitrarily an upper limit to the weights, and below that to weight 
inversely as the estimated variance. This is equivalent to recogniz¬ 
ing that in practice there is a limit to the accuracy with which an 
individual experiment may be carried out, and that very low values 
of are likely to be under-estimates of the corresponding a t . The 
method has the advantage that no single experiment exerts too 
predominating an influence on the mean, while bad experiments are 
properly scaled down in weight; on the other baud, it has an element 
of arbitrariness in the choice of the upper limit. A comparison 
of the four types of mean for low values of n is given in the next 
section, the number of centres being assumed to be large. 




§ L The Relative Efficiencies of Four Types of Mean. 

The relative efficiencies of the weighted mean and the maximum 


likelihood solution have been shown to be 


W+l, 


respectively, where n is the number of degrees of freedom in the esti¬ 
mates Si. The variance of the unweighted mean is 

!>*•* 


where k is the number of centres, and can be calculated for any 
given set of a t . 








1937] Analysis of a Series of Similar Experiments. 


107 


To calculate tlie efficiency of the weighted mean with an arbitrary 
upper limit, let the minimum true error variance be guessed as ty c 2 . 
Then in sampling from a set of experiments in which the true variance 

is a 2 , we take the weight w as whenever s 2 < <? 0 2 and as ~ 2 when- 

c 0 s 

ever s 2 > a 0 2 . The variance of the mean S{wx)/S(w) for any given 
set of o 2 and fixed weights is 

S(w 2 a 2 )/S 2 (w) 

and the average variance for a given set of a 2 is 

S(w 2 <?)/S 2 (w) 

For n = 6, for example, the probability that s 2 <; cr 0 2 is, (cf. (3)), 


P(s 2 <: V) -1 - rw (l + 1 x 2 + ^ (l x 2 ) 2 ) 

where |x 2 = 3cr 0 2 /cr t 2 . 

We require also the mean value of in the range (a 0 , oc). This is 
found to be 


2ov 


+ lx 2 } 


Thus 


- r*[l + 5* a + fl^)'}] M 


Similarly 


w* 


] -e-ix'fl 


I . X» \ 


1 f J 

2 ! \2 


X 8 '/) 


U ' c lx* 


The relative efficiencies of those four types of moan will depend 
on the distribution of experimental errors a, 2 . To obtain some actual 
figures, the efficiencies have been calculated for some sets of hypo¬ 
thetical values of a? which arc intended to (‘over the range likely 
to occur in practice. It is assumed that the k centres arc divided 
into three groups as regards accuracy : a number XA* have the same 
experimental variance lv, \ik have variance mv, while the remaining 
vk have variance nv. We have 


X -f p -f v = 1 and l < w <n 

The sets of values assigned to (X, (x, v) are; (0*2, 0*6, 0*2) and 
(0*3,0*4,0*3), the first set representing, for instance, the case in which 
6o per cent, of the experiments have the same accuracy, while 
20 per cent, are less accurate and 20 per cent, more accurate. For 
each of these sets, (A, m, n) have been given the values (,], 1, 2) and 
(J, 1, 4) respectively. The case (X, p, v)« (0-2,0*6, 0*2), (Z,m, n) — 
(|, 1, 2) means, for instance, that J of the centres have experimental 



108 


Cochran —Problems Arising in the 


[No. 1, 


variance** \ r 9 1' have variances v and the remaining l have variances 
2r. The four cases resulting cover a fairly wide range of variation 
in the accuracy of individual experiments, the relative accuracy of 
the best and the worst experiments ranging from 4 to 16. 

Comparison of the four types of mean suggested is made in Table 
II for n = 6, 4 and 2. 

Table II. 

Relative Efficiencies of Four Types of Mean. 







1937 ] Analysis of a Series of Similar Experiments . 


109 


it is assumed that there is no variation in response from experiment 
to experiment, the weighted mean, weighting inversely as the esti¬ 
mated variances, may be recommended if 15 or more degrees of freedom 
are available in the estimates of the weights. With less than 15 
degrees of freedom one of the other means is advisable, and each has 
something in its favour. 

The maximum likelihood solution is satisfactory from the point 
of view of information for values of n as low as 6, though it is slightly 
more tedious to calculate than the other means. The unweighted 
mean has simplicity to commend it, and is particularly suitable with 
sets of experiments which do not vary widely in precision and with 
low values of n. For values of n below 6 the weighted mean with 
a fixed upper limit is the most accurate of the three, if the limi t 
can be chosen to represent the accuracy of the best group of experi¬ 
ments. The difficulty of assigning a standard error to this mean, is, 
however, a serious disadvantage. 

§ 5 . The Test of Significance of the Mean Response. 

The tests given here are strictly appropriate to a large number of 
centres; the mean is assumed to be normally distributed, and is 
compared in the normal probability table with an unbiased estimate 
of its variance. In the analogous case of equal precision, the approxi¬ 
mation is equivalent to replacing the ^-distribution for nk degrees 
of freedom by the normal distribution. The agreement between the 
exact and approximate tests may not be as good with unequal as 
with equal variances, but may be expected to be satisfactory unless 
k is small. The case k — 2 is being investigated. 

The estimated variances of the weighted mean and the maximum 
likelihood solution have already been found. The average variance 

of the weighted mean is (~a)- Replacing S by an 

unbiased estimate in terms of s 2 , we get for the standard error 

In the maximum likelihood solution, the estimate of the experimental 
variance a, 2 at any centre is taken as 

{ns , 2 + fa - 4* 1) 

and the standard error of the mean mav be written 

(' L±iV /„/ * + 1 \ 

Vn - 1 >1 V + (as — A?/ 

where p. is the estimate of the mean. 



110 


Cochran —Problems Arising in the 


[No. 1, 


Tlie estimated standard error of the unweighted mean x is 

*s=vmt 

x 

It should be noted that the distribution of — depends on the ratios 

Sy 

of the unknown a 2 , and is not that of Student’s t unless the c 2 are all 
equal. Where the product nh is sufficiently small that a Z-test is 
indicated, a first approximation to the exact test may be found by 
a device which has been used by Fairfield Smith ( 4 ). The variance 
of any individual estimate s 2 is 

TV) = 2 

Hence an estimate of the variance of s/ is 

But if n T is the number of degrees of freedom appropriate to s x , 
an estimate of V(s x 2 ) is 

£»x 4 = 

Thus the relative precision of s 7 is estimated by assigning to it 
a number of degrees of freedom equal to 

This number only attains its maximum, nh, if all experiments have 
the same s 2 . It reaches its minimum, n, if a single centre is much less 
accurate than any of the others, and in this case indicates, quite 
correctly, a Z-test against n instead of nk degrees of freedom. In 
general, the integral part of n x may be taken as the number of 
degrees of freedom in s x . No examination of the closeness of this 
appoximation has yet appeared in print, but consideration of the case 
with two centres only indicates that the approximation may 
over-estimate on the average the probability of a deviation arising 
by chance. 

I have been unable to find any method of obtaining a simple 
estimate of the standard error of the weighted mean with fixed 
upper limit. Even if a method were found, a weighted mean with 
a badly chosen upper limit would be assigned an under-estimate of 
its standard error. Unless this difficulty can be satisfactorily over¬ 
come, this mean is ruled out where an exact test of significance is 
required. 

§ 6 . The Test of Significance of the Variation in Response from 
Centre to Centre . 

A test could be obtained by solving equations ( 1 ), ( 2 ) and ( 3 ) for 
c and using the solution, 6, as a test criterion, the significance levels 



1937 ] Analysis of a Meries of Similar Experiments. 


Ill 


being obtained from tbe frequency distribution of a when a = 0 . 
The test would be efficient if the mathematical specification of the 
problem set up in § 2 conformed to practice, but in any case the method 
cannot be used owing to the complexity of the equations. 

If the mean response jjl were known, the values (x — y)js would, 
in the absence of any variation in response, be distributed as t with 
n degrees of freedom, and a sensitive test could presumably be based 
on the value of 

Where the value of y. is unknown, analogy with the analysis of vari¬ 
ance suggests that the appropriate estimate of it for this purpose is 

x 1 

the weighted mean y w = S S which is the value of y. which 

S '" 1 s 

minimizes Q . With this value inserted, Q may be written 

S o 

The efficiency of this quantity as a test criterion will depend on the 
type of variation in response which occurs in practical applications, 
but it seems reasonable on common-sense grounds. It may be noted 
that the same type of expression is used to test the departure from 
independence in a 2 X N contingency table (cf. ( 5 ), § 21 ). 

If the weights were known exactly, Q would follow the yf distri¬ 
bution with ( h — 1 ) degrees of freedom, h being the number of centres. 
In general, Q may be written 

so that it is distributed as the sum of the squares of k values of £, 
less a correction term which is a weighted mean of the values of t. 
A good approximation to the distribution of Q should be obtained 
by replacing the weighted mean in the correction term by an un¬ 
weighted mean. In particular 

Mean (Q) —( — „ ) {&—1 - - — } 

< Mean {S(t - «)*} = ) (h - 1 ) 


so that in replacing Q by S(t — 7) 2 we are probably tending to over¬ 
estimate slightly the probability of a discrepancy arising by chance. 
Further 




3 1 

L{rt> — 1 ) j 



112 


CoiHUAN —Problems Arising in the 


[No. 1, 


»_ 9 

Thus 1-- ) S{t — tf has the same mean as y 2 with (k — 1 ) degrees 

n 

of freedom, hut its variance is too large, approximately in the 

ratio I -—\ ). A transformation which leads to the same mean and 
' n — 4 

variance as y 2 is obtained by putting 

- <* -1) + N g {(1=J', m - if - (i -1)} 


The distribution of y tr 2 and the tabular y 2 tend to the same normal 
distribution, for any value of n, as k tends to infinity; they also tend 
to coincidence, for any k, as n tends to infinity, and have the same 
mean and variance for all values of n and k , except for very small 


mav 


values of both, in which the additional factor ^1 — 

be brought into the transformation. The agreement between the 
distribution of y^ 2 and y 2 may be expected to be at its worst for low 

values of both, since y w 2 has a lower limit (k—l)(l— 


instead of zero. The difference even here is likely to be small for 
moderate values of n and k ; for n = 10, k = 11, for instance, 
the lower limit of y w 2 is 1*835 aQ d the probability of getting a value 
of y 2 lower than this for 10 degrees of freedom is only 0-0025. 

It is therefore suggested that the transformation 

X 2 = (* - 1) + {<rir)« - <* -!)} 


may be used in testing the significance of the variation in response 
from centre to centre. 

For a given value of n } this approximation will be worst when there 
are only two centres. The difference is, however, still on the side of 

(x — x ) 2 

declaring too few significant results, for Q = v \~r \ and the Q,y 2 

s l T* s 2 

transformation is based on the t 2 distribution with n degrees of 
freedom, whereas the exact distribution of Q is probably better 
approximated by using in t 2 a number of degrees of freedom lying 
between n and 2 n, as indicated by the method suggested in § 5. 
Even with only two centres the (?,y 2 transformation will thus avoid the 
danger of obtaining too many significant results, though the method 
of determining the equivalent number of degrees of freedom is to be 
recommended as more sensitive, and further work on this important 
particular case is needed. 

In general, the use of Q as a test criterion is inadvisable for values 
of n below 6; the Q,y 2 transformation breaks down when n — 4, 
and the variance of the ^-distribution itself is infinite when n = 2. 




1937 ] Analysis of a Series of Similar Experiments. 


113 


In these cases it is best to obtain the individual values of t = (x — \i)js 
and compare these with the tabulated ^-distribution. The esti¬ 
mate of p used should be one of the three other types of mean 
suggested, but not the weighted mean. The transformation from 
Q to x 2 cannot, however, be used with these means, since they will 
always give higher values of Q than the weighted mean. 


§ 7 . Estimation of the Mean Response when it Varies from Centre to 

Centre . 

If it cannot be assumed that the interactions do not exist, the 
question of estimation of the mean response is more difficult. 
Equation ( 1 ) 

s (^Ji) = o 

or 2 + a t 2 

indicates that a kind of semi-weighted mean is appropriate, but the 
complete solution of equations ( 1 ), ( 2 ) and ( 3 ) would be very tedious. 
If a fairly efficient solution is required in a particular case, there will 
probably be very little information lost if the s t 2 are used as estimates 
of cr t 2 and and a 2 are estimated from the simultaneous equations 


S^ZA i = 0 


s 


o 3 + Sl 2 - 

1 ,=s¥± 


t-0 2 


o 2 + S» 2 " (a 2 + S t 2 ) 2 

The solution of these equations is as a rule quite rapid. 

In general, a simpler solution will be wanted, and it is worth com¬ 
paring the efficiencies of the unweighted and weighted means. 
Consider first the case in which the values of a 2 and a* are known 
exactly. The variance of the semi-weighted mean 


B- 


1S 


! + *. a 

i is 


s 


o 2 + °l 2 


O 2 + CT , 2 

The variance of the unweighted mean is 

10+1 8 ”’) 

while that of the weighted mean 


s^ls 


is 


8 


o*+' 


S 2 -- 2 


s- 2 

S, 2 


S- 


1+ O 2 - 


s - 



114 


Cochran —Problems Arising in the 


[No. 1, 


The relative efficiencies of these three types of mean will depend 
on the distribution of experimental errors a, 2 and on the interaction 
a 2 . To obtain some actual figures, the efficiencies have been calcu¬ 
lated for the set of experimental variances used in Table II, giving 
to (X, jjl, v) the additional values (o-i, o*8, o-i) and to ( l , m, n) the 
additional values (-£, i, 3). This provides nine instead of four ex¬ 
amples of variation in experimental errors. 

The interaction variance has now to be considered. For each of 
the nine selected cases, the efficiencies are continuous functions of 
the interaction variance rv, say. The mean experimental variance 
for any set (X, g, v), (I, m, n) is 

(X? -j- y.m -j- v;?)? 1 = kv (say) 

and for each of the nine cases, the efficiencies have been calculated 
for r = 0, 4/c, k, 2k, so that the interaction variance is respectively 
o, 1 , 1, 2 times the mean experimental variance. This gives a 
3 x 3 x 4 table of 36 pairs of entries. 


Table III. 

Efficiencies of the Unweighted and Weighted Means . 






1937 ] Analysis of a Series of Similar Experiments . 


115 


loss of information being 17 per cent, in the worst case. For i = 2 
the loss of information with the unweighted mean is small in all 
cases. 

In practice the situation is much more favourable to the un¬ 
weighted mean than Table III indicates. For the weights in the 
semi-weighted mean and in the weighted mean have to be estimated, 
and the estimation results in a loss of information on these means 
to which there is no corresponding loss on the unweighted mean. 
In particular, the information in the weighted mean has been 


shown above to be decreased in 


th e ratio 


where n is the num¬ 


ber of degrees of freedom in the local experimental errors. With 
n = 16 , for instance, the efficiencies of the weighted means in Table 
III have to be multiplied by This would make the unweighted 
mean superior to the mean throughout Table III, except in a few 
cases in which there was no interaction. 

These results indicate that the unweighted mean may safely be 
recommended where we do not assume that interactions are non¬ 
existent, particularly since, with a large number of centres, it is 
usually necessary to keep the individual experiments small, so that 
there will rarely be as many as 20 degrees of freedom in the estimates 
of the local experimental errors. 

Where the response varies from centre to centre, it is usually 
appropriate to test the mean response by comparing it with the 
variation from centre to centre, especially if the centres constitute 
a random sample from all possible centres. The usual expression 
s- 2 = S(x — x) 2 /k(k — 1 ) taken over all centres, is an unbiased 
estimate of the variance of the unweighted mean x. The £-test with 
(k — 1) degrees of freedom will, however, lead to too many signi¬ 
ficant results unless the interaction variance is large compared with 
the local experimental variances, since with one very inaccurate 
experiment, for instance, the estimate sf* might have a precision 
based on only one instead of (k — 1) degrees of freedom. The £-test 
is, however, known to be relatively insensitive to most types of 
departure from normality in the original data and may be recom¬ 
mended in the great majority of cases, except where the 
probabilities of a number of tests are being combined, or where the 
test gives a result very near one of the significance levels and an 
exact verdict is wanted. 

Two alternative tests may prove useful. In many cases it may 
be sufficiently precise to take account of the signs of the responses 
only. The efficiency of this test, where the variances are equal, 
2 

has been shown to be - or 64 per cent, in (6), where a table of the 



116 


Cochran— Problems Arising in the 


[No. 1, 


5 per cent, points is given. In doubtful cases, with a small number 
of centres, an exact test of significance may be made by assuming 
that in the absence of a true mean response, the responses observed 
at the centres would have occurred with positive or negative signs 
equally frequently. The complete distribution of the mean may be 
worked out, as exemplified in ( 1 ), pp. 50 - 4 . Bartlett ( 7 ) has con¬ 
sidered the approximation to this distribution by a continuous 
frequency distribution, and further work on his lines may enable 
us to assign an appropriate number of degrees of freedom to $ x 2 in 
any particular case by use of some statistic such as Fisher’s 
((5), § 14 ). In a few examples I have worked out, the f-test gives a 
good approximation. 

§ 8.-4 Test of Significance of the Variation in the Local Experimental 

Errors. 

It sometimes happens that the individual experiments may all 
be regarded as having the same precision, and in this case, as pointed 
out in § 1 , the analysis is much simplified. It is on this account 
worth having an idea of the amount of purely random sampling 
variation to be expected in the experimental errors. To obtain this, 
the experimental variances $ 2 are each divided by their mean s 2 
over all 1 c centres. The corresponding values, multiplied by n, 
should be distributed approximately as x 2 with n degrees of freedom, 
and their range may be compared with the published table ( 5 ). 

A general test of significances of departure from the hypothesis 
that the variances are all estimates of the same quantity has been 
given by Neyman and Pearson (8). The test function which they 
recommend is 



Tables have been given by Nayer ( 9 ). 

In conclusion, while the above text has referred verbally to agri¬ 
cultural field experiments, the problems discussed are likely to turn 
up in any large-scale co-operative experiment. In particular the 
data considered by Neyman and Pearson (8), which arose in a factory 
experiment on the control of uniformity of product, are of exactly 
the same form as those discussed above. Their figures give the break¬ 
ing strength under tension of small briquettes of cement-mortar. 
The cement was mixed on each of io different days, and 5 briquettes 
were tested each day. Thus the results provide a mean breaking 
strength for each day and an estimate s l3 based on 4 degrees of 
freedom, of the variability in the strengths within days. The ques¬ 
tions which are of interest in this and si mi lar factory experiments 



1937 ] Analysis of a Series of Similar Experiments. 


117 


on control of quality may be somewhat different from those in agri¬ 
cultural field experiments. The question whether the standard 
errors s l vary from day to day, which in agricultural field experiments 
is not of practical importance, except in so far as it affects the effici¬ 
ency of the experiments, and indeed is usually taken for granted, 
is one of the prime factors to be tested in a manufacturing experiment. 
On the other hand, the estimation of the mean of the x t and the 
question whether the x % have varied from day to day or from centre 
to centre is usually of common interest in both problems, and the 
discussion given above of the tests of significance may be of use in 
factory as well as in field experiments. 


Snmmaty. 


This paper considers the statistical analysis appropriate to 
experiments which yield, at each of a number of centres or times, 
an estimate x t of a treatment effect and an estimate $ t of its standard 
error, based on n degrees of freedom. This type of data may arise 
in many modern types of research, as, for instance, series of agri¬ 
cultural field experiments, or factory experiments on the control 
of quality. The problems considered are the estimation and test 
of significance of the mean treatment effect and of its variation 
from centre to centre, these being the most important preliminary 
questions in agricultural experiments of this type. 

If the estimates x % may be considered equally accurate, i.e. if the 
quantities s t are all estimates of the same cx, the analysis of variance 
gives a convenient and familiar method of treatment. Where this 
is not so, the question is more difficult. 

In the absence of any variation in treatment effect from centre 


to centre, the weighted mean S 


suggested 


as a suitable 


estimate of the average treatment effect if at least 15 degrees of 
freedom are available in the estimates s t . With fewer than 15 
degrees of freedom, the weighted mean is not very efficient, 
and the maximum likelihood estimate, the solution of 


S _ x __ n 

ns 2 + — l ^) 2 

is preferable, since its increased precision, which is equivalent to 
having three extra degrees of freedom for the estimation of the 
weights, is well worth the extra labour it involves. With fewer 
than 6 degrees of freedom in s t , estimation of the weights involves 
a considerable loss of information. A comparison is made in this 
case of the relative efficiencies of the maximum likelihood solution, 
the unweighted mean, and the weighted mean with an arbitrarily 



118 


Analysis of a Series of Similar Experiments. [No. 1, 


chosen upper limit to the possible weights, in a set of hypothetical 
examples designed to cover the variation in experimental errors 
likely to occur in practice. The weighted mean with fixed upper 
limit is very satisfactory from the point of view of precision, but the 
difficulty of assigning a standard error to it is a serious disadvantage. 
Tests of significance of the ordinary weighted mean, the maximum 
likelihood solution and the unweighted mean are given. 

Where the variation in treatment effect is not assumed non¬ 
existent, the unweighted mean should he used. The question 
of testing its significance by comparison with the variation in response 
from centre to centre is discussed. 

The weighted sum of squares of deviations Q = S \ (x —• x ) 2 is 

s 

recommended to test the significance of the variation in treatment 
effect from centre to centre. An investigation of the frequency 
distribution of Q is made and a transformation given by which it 
may be referred to the published table of y}. 

Further work is needed to determine more precise tests for the 
case of a few centres only. 

I have to thank Mr. F. Yates for some useful suggestions. 

References . 

1 R. A. Fisher, The Design of Experiments. Edinburgh : Oliver and Boyd, 

1935, pp. 211-5. 

2 M. S. Bartlett. “ The Information Available in Small Samples,” Proc. Camb. 

Phil. Soc., 1936, Vol. XXXII, p. 562. 

3 R. A. Fisher, “ The Mathematical Distributions Used in the Common Tests 

of Significance,” Econcmietrika , 1935, Vol. 3, No. 4, p. 356. 

* H. Fairfield Smith, *“ The Problem of Comparing the Results of Two Experi¬ 
ments with Unequal Errors,” C.S.IJt. Journal, 1936, Vol. 9, No. 3, 

pp. 211-2. 

5 R. A. Fisher, Statistical Methods for Research Workers. Edinburgh : Oliver 

and Boyd, 6th ed., 1936. 

6 W. (i. Cochran, “ The Efficiencies of the Binomial Series Tests of Significance 

of a Mean and of a Correlation Coefficient,” J. Roy. Stat. Soc., 1937, Vol. 0, 
Part I. pp. 69-73. 

7 M. S. Bartlett, “ The Effect of Non-Normahtv on the ^-Distribution,” Proc. 

Camb. Phil. Soc., 1935, Vol. XXXI, p. 228. 

8 J. Neyman and E. S. Pearson, “On the Problem of k Samples,” Bull, de 

VAcad. Polonaise des Science* et des Lett re*. A, 1931, pp. 460-81. 

9 P. P. N. Nayer, “ An Investigation into the Application of the L v test, with 

Tables of Percentage Limits,” Statistical Research Memoirs . 1936. Vol. I. 



1937] 


119 


Significance Tests which may be Applied to Samples prom 
any Populations. 

By E. J. G. Pitman, 

(University of Tasmania). 

1. The object of tliis paper is to show how we can devise valid 
tests of significance which involve no assumptions about the forms 
of the populations sampled. It is also shown that precise fiducial 
limits can be determined for the difference of means of populations 
of the same form, no matter what the form of the populations may 
be. While only one test is discussed in this paper, the principle 
is applicable to all tests. The main idea is not new, it seems to be 
implicit in all Ei&her's writings; * but perhaps the approach to 
the subject, frankly starting from the sample and working towards 
the population instead of the reverse, may be a bit of a novelty. 

2. Discordant , Concordant , and Neutral Separations. 

Suppose that we have m + n numbers (not necessarily all 
different), and that their mean is z. Numbers which are equal in 
value are supposed to be distinguishable from one another—we may 
think of the m + u numbers as painted on m + n different marbles. 
Consider a separation of the numbers into two different classes of 
m and n, (m <; n), 

u v u 2 • • • Wm, with mean u, 
v 2 . . . v n , with mean v. 

The number, N, of such separations is A n C m , provided that when 
m = n the two classes, though equal in size, are regarded as different, 
so that two separations in which the classes are simply interchanged 
are regarded as different separations. 

We shall call | u — r| the spread of the separation. Since 

mu nv— (m + n)z, 
an alternative expression for the spread is 

(m -f- n)\u — 5 | __ ( m + »)[S^ — 
n ~~ mn 

Let M be a fixed integer less than N. Consider any particular 
separation R . If the number of separations with a spread equal to 
or greater than that of R is less than or equal to M, we shall call R 

* See, for example, R. A. Fisher, The Design of Experiments , p. 50 (Oliver 
and Boyd). 



120 


PmtAX— Significance Tests which may he [No. 1 , 


discoidant . If there are M or more separations with a greater 
spread than E, we shall call E concordant. A separation which is 
neither discordant nor concordant will be called neutral . When 
there are neutral separations, the number of discordant separations 
will be less than M and the number of concordant separations less 
than N — M. With no neutral separations the numbers will be 
M and N — M respectively. If m = n, the separations occur in 
pairs with equal spreads, and in that case we shall always take M 
to be even. The discordant separations are most easily picked out 
as those with the largest values of |S« — mz\. 

If the separation is arrived at by chance, by taking at random 
m of the numbers from the whole set of m + the probability of 
its being discordant is M/N = P, say, when there are no neutral 
separations, and the probability of a concordant separation is 1 — P. 
When there are neutral separations, the probabilities are less than 
P and 1 — P respectively. It should be noted that throughout any 
discussion wi, n , M are supposed to be fixed. 

If we increase every member of the upper class (that with the 
greater mean) of a separation by the same positive quantity d, or 
if we decrease every member of the lower class by d, the spread of 
the separation will be increased by d, while the spread of any other 
separation will be increased, if at all, by a quantity less than d , 
except that when m = n the spread of the complementary separation 
(derived by interchange of classes) will also be increased by d. 
Hence a discordant separation treated in this way will remain 
discordant. A neutral separation will become discordant no matter 
how small d is, provided that when m = n we always take M even, 
as stated above. 

If d is positive and not greater than the spread of a separation, 
and if we decrease every member of the upper class by d 7 or increase 
every member of the lower class by d, the spread will be decreased 
by d. Hence if d is sufficiently small a discordant separation will 
remain discordant. A neutral separation will become concordant, 
no matter how small d is. 

Let 

2*1, x 2 . . . a* m , y v y % ... y n 

be m -f- n given numbers, and consider the values of d which make 
the separation 

*1 + 3*2 + of - • « + d, 

y* • • - Vn> 

discordant. Denote by d ± the upper bound of those which, satisfy 
the inequality, 

£ + d < Si 



1937J Applied to Samples from any Populations. 121 

and by d 2 the lower bound of those which satisfy the inequality, 

x + d > y. 

From the results given above it follows that the separation is dis¬ 
cordant if 

d d ^ or d d 23 

concordant if 

d^ d <C. d 2i 

neutral if 

d —— d-^ , or d — d 2 . 

3. Discordant , Concordant , awcZ Neutral Pairs of Samples. 

Suppose now that we have two samples of m and n, and that the 
sample values are 

#i, x 2 . . . x m , 

Vv y* • . . 

Whether the samples are from the same population or not, we shall 
say that they are discordant, concordant, or neutral, according as 
the separation of the m + n numbers, 

x l9 x 2 . . . x m , y v y 2 . . . y n 
into the two classes 

x v x 2 ... x m 
yv y» • • • yn 

is discordant, concordant, op neutral. We shall assume, for the 
present, that the populations sampled are infinite, or that all draw¬ 
ings are with replacement. 

If two samples A, B of m, n members respectively are drawn from 
the same population, the probability of their being a discordant pair 
is <; P, and the probability of their being a concordant pair is ^ 1 — P. 

Denote by C the sample of m + n obtained by combining A 
and B . The probability of drawing any particular pair A, B is 
equal to the probability of drawing a sample of m + n identical 
with C, multiplied by the probability of drawing (without replace¬ 
ment) from 0 a sample of m identical with A. Hence the probability 
of drawing a discordant pair is 

where p c is the probability of drawing C, q c is the probability of obtain¬ 
ing a discordant pair by drawing m from <7, and the summation 
is over all the possible samples C of m + n members. But 

$C P) 

therefore the probability is not greater than 

PSj? c =P. 

Similarly the probability of drawing a concordant pair is not greater 
than 1 — P. 



122 


Pitman —Significance Tents which may be [No. 1, 


If, then, we always decide that discordant samples come from 
different populations, the probability of error when the samples 
actually come from the same population is not greater than P.* 
In practice we choose M so that P is approximately equal to our 
usual working value, such as 0*05 or 0*01, for the permissible prob¬ 
ability of error of such statements. It is obvious that for two 
populations of given forms, the larger the difference of their means 
the more likely is the test to give a significant result. 

We can always determine whether two given samples are dis¬ 
cordant or not from an examination of the sample values alone, 
without any knowledge of the populations from which they have 
been drawn. When m and n are at all large, the direct process 
would often be tedious; but frequently the work can be greatly 
reduced by the use of an approximate method described later. 

Example . Are the following samples significantly different? 

1-2. 2-3, 2*4, 3*2 and 2-8, 3*1, 3*4, 3*6, 4*1. 

For convenience we may, without affecting the test, subtract 1*2 
from each sample value so as to reduce the smallest number to o, 
and then multiply each by 10 to get rid of the decimal points. We 
then have 

0, 11, 12, 20 and 16, 19, 22, 24, 29. 

Arranging these in order of magnitude, we have 

0, 11, 12, 16, 19, 20, 22, 24, 29. 

The mean value is 17, so that mz = 68. There are 126 separations 
of the numbers into classes of 4 and 5. Hence if we take M = 6, 
we shall have P = j 21. 

The groups of 4 which give the largest values of |Sm — 68 1 are :— 






S// 

lSu-681 

0, 

11, 

12, 

16 

39 

29 

0, 

11, 

12, 

19 

42 

26 

0, 

H, 

12, 

20 

43 

25 

0, 

11, 

12, 

22 

45 

23 

29, 

24, 

22 

20 

93 

27 

29, 

24, 

22^ 

19 

94 

26 

29, 

24, 

20, 

19 

92 

24 

29, 

24, 

22, 

16 

91 

23 


The group o, n, 12, 20 gives the fifth largest value of \T,u — 68|, 
and so with M = 6 the corresponding separation is discordant. 
Hence at the level P= 1/21 the samples must be regarded as 
significantly different. M == 5 would give P approximately 0*0397, 
and at this level also the samples are significantly different. 

* We could, of course, make it exactly P by properly assigning by chance 
neutral pairs to the discordant or concordant class. 





1937] Applied to Samples from any Populations . 123 

4. Fiducial limits for the difference of means of two populations . 

Let 

#2 * * • 

Vi, y% . . • ft 

be samples from two populations which differ only in location, so 
that by change of origin the frequency function of one population 
could be made tbe same as the frequency function of the other. The 
two populations might, for example, be two normal populations 
with equal standard deviations but different means, or two exponential 
populations with equal scales but different origins. 

Denote the mean of the x population by a and the mean of the 
y population by p, and put p — a = d } then the distribution of 


x + d is the same as the distribution of y> Hence 

~b d-i #2 d • * * + d . . . (i) 

Vi, V2 • * * ft .(ii) 


may be regarded as two samples from the same population, and 
therefore the probability that they are discordant is P, and the 
probability that they are concordant is <; 1 — P. Now, for given 
values of the x and the y we can, as shown above, determine numbers 
d l3 d z such that the samples (i) and (ii) are discordant when 

d< d 1 oi d> d 2 , 

and concordant when 

d^ <C. d <C. d%* 

Hence the statement 

d± <; P — a <[ d 2 

has fiducial probability ;> 1 — P, for it will be untrue if, and only 
if, the samples (i) and (ii) are discordant, and the probability of 
this is P. On the other hand, the statement 

dj P oc c?2 

has fiducial probability ^ 1 — P, for it is true if, and only if, the 
samples (i) and (ii) are concordant. If we were really sampling 
from continuous distributions only, the probability of a neutral 
pair would be strictly zero, and both statements would have fiducial 
probability 1 — P. In many cases the probability of a neutral 
pair will be negligible; but it is always advisable to use the less 
stringent statement 

d x <£ p — a d % , 

for we are certain that its fiducial probability is ^ 1 — P. 

Let us consider again the pair of samples discussed above, and 
suppose that they are drawn from two populations which differ 
only in location. Taking the modified values 
0 , 11 , 12 , 20 and 16 , 19 , 22 , 


34 , 


29 , 




121 Pii \iA2s —fiiynificunce Tests which may be [No. 1, 

we have to determine the values of d which make the separation 

d, d -r 11, d -f 12, d-1-20 

16, 19, 22, 24, 29. 

neutral. We shall again take 1/ = 6, P = 1/21. 

When A = 1-5, mz = 7of, and the groups of 4 which give the 


largest values of 

|Sw — 

7°f| are 


2 u 

|2u — 70$| 

1*. 

m. 

13J, 

16 

43J 

27 J 

H, 

m. 

13i, 

19 

46 J 

24J 

14. 

12J, 

134, 

21* 

49 

214 

ii> 

124, 

16, 

19 

49 

211 

29, 

24, 

22, 

214 

96| 

25? 

231 

29, 

24, 

22, 

19 

94 

29, 

24, 

214, 

19 

934 

22? 


Thus there are five groups giving a larger value of \Eu — mz\ than 
the group 12^, 13*, 21*, and one group giving the same value. 
Hence when A = 1-5 the separation is neutral, and therefore d 1 = 1-5. 
In the same way we obtain d % — 22 . If a denotes the mean of the 
population from which the set of four are drawn, and p the mean 
of the other population, we have 

1*5 <1 p — a <i 22, 

with fiducial probability 20,21. With the original scale the 
statement would be 

0-15 ^ p - a ^ 2-2. 

Instead of the spread, ]« — v\, we may use \u — 5], which is 
proportional to it, as the criterion of discordance of a separation, 
or we may use any monotonic function of this such as (u — z) 2 . 
It is therefore of interest to find an approximate form for the dis¬ 
tribution of (u — z ) 2 when m and n are large. 

5. Approximate distribution of the square of the difference between 
sample mean and population mean token the sample is drawn without 
replacement from a finite population . 

Let 

z v z 2 . . . Zy 

be N numbers with mean 5, and let the moments of the z about their 
mean be (Xg, etc. Suppose that a sample of m is drawn without 
replacement. Denoting the mean of the sample by w, we have the 
following expressions * for the mean values of the powers of (u — 3 ):— 

* See e.g. 0. N. Anderson : Einf&hrung in die mathematische Statistik, 1935, 
p. 219. Note that Anderson’s n is here replaced by m. 



1937] 


Applied to Samples from any Populations . 


125 


E(u — z) = 0, 

_ (N — m)[x 2 


E(u — z) 
E(u 


5) 4 = 3 


(A T - l)w’ 
N(N--jn)(N — m ■ 


l)(m - 1) 


»»*(# - l)(tf - 2){2V _ 3) 


fV 


(2V — w){2V 2 + N — 6m(2V — m)} 

+ m 3 (2V - 1)(N - 2)(N - 3) ^ 

E(ii — z) 2r = 1.3.5 . . . (2r — 1)jl — ^- r +terms of higher 

order in 1/m, l/N wlien m and N are large and of the same order. 

If we put 

w will be positive and not greater than unity (see below), and we 
shall have 


E(w) 


N-V 


„ 3N f, N-l 1 

} _ (N - 1 )(N - 2)(N - 3)1 m(tf- »)J 


._ 1 f + •» _r\ (ft 

(N - 1)(N - 2) (IV - 8)1 m(N - m) °J g 2 2 


= _1_ + /®±il_ 6 \ 

i 


where 
0 = ; 

and 


(isr — i) ( iv +1) 


fii4 3(y-i) i 

(N-l)(N-2)(N-3) W iV-h 1 / 

(1 + 0). 


N±1 f N(N +1) |f 6 1 

Z(N - 2)(N - 3) \m(N - m) /r^N+lJ’ 


k = ^-3. 

(V 

Now for fixed N the modulus of the factor 

m±r> 

m(N — m) 

will have a maximum value, 

2 (N - 2) 

N 

at m = |N, and it will take this value again at 
N - 2m , ITT=3 


~N 


4 


2 N-r 



126 


Pitman —Significance Tests which may he 


[No. 1, 


which gives 

- -— - or o for A = 14, 

N — m o 

= or 5*8 approximately for X large. 
o*o 

For values of m t (X — m) which lie between these, the modulus will 
have a value which is smaller than 2(A T — 2 ) t N. It will be found 
that, owing to the fact that m must take integral values, with the 
exception of the case, X = 12, m = 2, for values of X greater than 
6 the modulus of the expression 


is not greater than 


provided 


N(X+l) 6 

m(X — m) 

2 (N - 2 ) 

N 


1 m 
o N-m 


<.$• 


Hence for such values of N and m 

■el < 2(A T +1) f 6 1 
|6 ‘ ^3A’(xT-3) r + iV+lJ 5 

so that 6 may be neglected when N is large and £ not too large. 
From the general expression for E(u — z) 2r , we have 

3.5 


jE^IC 3 ) : 

correct to the third order. 
Thus, 


(K-1)(N+1)(X+Z) 


and approximately 

E(w 2 ) 


0 ^ w ^ 1 , 

*W-S=-V 


(N — 1)(N + 1) 

= (X - 1)(X + 1)(N + 3) ’ etc ‘ 

The moments about the origin of the B(p, q) distribution, i.e. v tbe 
continuous distribution from o to i with frequency f unc tion 

1 


B(p,q)‘ 


are 


p(p + ?) 


PiP ± l)(j> + 2) 


p +«’ (j> + q)(p + ?+i)’ (p + '?)tp+2 + i)(j)+?+^’ etc- ’ 



1937] 


Applied to Samples from any Populations . 


127 


which will be the same as those above if p = q = JiV — 1. Thus 
when N, m, N — m are large and ^ 2 2 , etc., not too large, the dis¬ 
tribution of iv will be approximately the same as a \N — 1) 
distribution. 

As an indication of how good the approximation can be, even 
with quite small values of N, m, N — m, a table is given below which 
shows the exact and the approximate distribution of w for a sample 
of 4 taken from the population 

1, 2, 3, 4, 5, 6, 7, 8. 

Here N — 8 , so that the approximate distribution is a B(|, 3) dis¬ 
tribution. P x is the probability of obtaining a value of w equal to 
or less than that shown; P 2 is the approximate value. In obtaining 
the value of P 2 corresponding to any value of w a rough u con¬ 
tinuity correction ” has been applied by integrating from o to the 
point half-way between that value of iv and the next higher value. 
In practice we could not usually apply such a correction without 
going to a lot of trouble to determine the values which w can take; 
but its effect would be small at the critical values of w, where the 
ordinates of the continuous distribution are small. 


w 

Pi- 

iV 

0-0000 

0-1143 

0-1446 

•0019 

•3143 

•3173 

•0476 

•5143 

•4954 

•1071 

•6571 

•6547 

•1905 

•8000 

•7865 

•2976 

•8857 

•8861 

•4286 

•9429 

•9522 

•5333 

•9714 

•9874 

•7619 

1-0000 

1-0000 


6 . Application of the Approximate Distribution of w. 

Beturning to the discussion of two samples, we may use w as 
our criterion of discordance. If the sample values are such that the 
above approximation to the distribution of w is sufficiently good, 
instead of hunting out the discordant separations and seeing if the 
separation determined by the pair of samples is among them, we 
may calculate the proportion of separations with a value of w as 
great as, or greater than, the value of w, say iu l3 for the particular 
separation we are considering. The proportion of such separations 
will be approximately 

JB(hhN-l) { (1 ~ ” 2 *’■ 

If the value of this is <1 P, the separation is discordant and the 
samples are significantly different. 

For the pair of samples discussed above w = 0*4833, and the 



12S 


Pitman —-Significance Te,\ts which may be [No. 1 , 


value of tiie integral is 0 * 0376 . Hence at the level P = 0*0376 the 
samples would be just significantly different. Actually, as shown 
in § 3 , they are just significantly different at the level P — 0*0397. 

If, having chosen P, we find u\ such that the value of the above 
integral is P, our criterion of discordance is 

w > io v 

An alternative expression for w is 


where 


mn 

m + n 


(u — v) 2 


«!+ S 2 + 

1 ' 2 1 m + n 

S x =Z(u~u) 2 } S 2 =Z(v-v) 2 . 


This shows that w is not greater than unity. It is also a convenient 
form for determining fiducial limits for the difference of the means 
of two populations which differ only in location. Using the notation 
of section 4 , and referring to the discussion there, we know that the 
probability that the samples (i) and (ii) are not discordant is ;> I — P. 
But if they are not discordant we must have 


that is, 


mn 
m +n 


w <* w v 


(d + x- y) 2 


Si+« 2 + 


mn 


(d + x— y) 2 


^ «’i, 


m + n K 

where S x — S(» — x) 2 , S 2 — S(y — y) 2 . 

The fiducial probability of this last statement is therefore ;> 1 — ft. 
It determines two limits equidistant from y — x. For the pair of 
samples already discussed we obtain, with P = 1,21, 

0*07 d ^ 2*18, 


on the original scale. These limits are not very different from 
those, 0*15 and 2 * 2 , obtained by the direct process without approxi¬ 
mation. 

The criterion for discordance of two samples, 


w > w v 

may be written 

w w l 

1 — w ^ 1 — w x 


mn 

m -f- n 


($-y) 2 


Si + S* 


> 


1 — w x 


which is 



1937] Applied to Samples from any Populations . 129 

and the inequality which determines the fiducial limits may be 
written 

onn r , 

—? — — (y — a?)r 

m + % ^ 

^1 + ^2 ^ 1 — 

In this form the test is identical in form with Fisher’s extension 
of Student’s test, and the ^-distribution used is the same, i.e. the 
distribution of w is the same as the distribution of the square of 
Student’s z for a sample of w + n-1. In the development of 
the theory given in this paper there is no assumption of even ap¬ 
proximate normality of the populations sampled. The fundamental 
test of significance is that given in § 3, and the fundamental method 
of determining fiducial limits is that which is derived from this in 
§ 4 . Whether the approximate method is to be used or not is decided 
entirely by the sample values. Of course these depend on the 
populations sampled, and populations which are close to normal 
will supply a large proportion of samples amenable to the approxi¬ 
mate form of the test. But the essential point of the method is 
that we do not have to worry about the populations which we do 
not know, but only about the sample values which we do know. 

7. Application of the Test to Samples Dtawn from Finite Popula¬ 
tions without Replacement . 

In proving the theorem that the probability of drawing a dis¬ 
cordant pair of samples from the same population is <; P, we 
rjsumed that the populations sampled were infinite, or that all 
drawings were with replacement. But the argument of § 3 will 
sr* 1 ! hold when samples are drawn without replacement from a 
finite population provided that the samples are simultaneous— 
that is, provided that both are drawn before either is replaced. 
In particular, the theorem is true when the two samples together 
form the whole population; it then needs no proof. 

This type of case is perhaps of more practical importance than 
the other. Consider an agricultural experiment in which a block 
of ground is divided into m + n equal plots which are all sown at 
the same rate with the same variety. To m of these plots, chosen 
at random, treatment A is applied, and to the remaining n treatment 
B. The yields are as follows :— 

Treatment A x l9 x 2 . . . x m 
Treatment B y l9 y 2 . . . y n 

Now, the yield from any plot may be regarded as the sum of three 
terms, thus 

x x = -f- X x — 

where a denotes the effect of treatment A, X x denotes the effect of 
SUPP. VOL. iv. no. 1. F 



130 


Pitman —Siyuijicmtce Tests. 


[No. 1. 


the fertility of the particular plot and other things peculiar to that 
plot, which we think of as determined before we made our random 
choice and sowed the seed, while denotes the effect of experimental 
errors in weighing, local accidents, and other things which happened 
to that particular plot after we had made our random choice and 
sown the seed. Now, though X 1 and Z/ were determined at different 
times, their sum may be called the chance contribution of the plot, 
and its value is — a. If we denote the effect of treatment B by bp 
the chance contribution of the first y plot is y 1 — b. 

The separation 

x 1 — a, x 2 — a . . . x m — a, 

Vi — #2 — & • • • Vn — h 

of the m + u numbers 

x ± - a, x 2 — a . . . - a, y x — b,y 2 - b . . . y H — l 

is a chance separation. Hence the probability that it is discordant 
is <; P. We shall not affect the discordance or non-discordance by 
adding b to each of the numbers. We then have 

x x + (b — a), x 2 + (b - a) . . . x m + (b — a) 

Vv Vi • • • V*- 

The statement that this separation is not discordant has fiducial 
probability ;> 1 P. But this statement is, as shown before, 
equivalent to 

di^Lb a d 2 , 

where d v d 2 can be determined from the given numbers 

* 15 *^2 • * * yv v* • • • y»- 

Thus fiducial limits for b — a can be determined. If zero is not 
included in the range determined by them, the treatments have 
shown a significant difference in effect. If this was all we wished 
to know we could find it out by seeing if the separation is discordant 
when b — a = 0. 

It is evident that other significance tests can be developed along 
these lines, in particular the variance test and the author hopes to 
deal with this in a further paper. 

Summary . 

It is shown that valid tests of significance can be devised which 
involve no assumptions about the forms of the populations sampled. 

It is also shown that fiducial limits can be determined for the 
difference of means of populations of the same form, no matter 
what the form of the populations may be. 

The test for significance of difference of means of samples which 
is developed in this paper will frequently, in practice, reduce in 
form to Fisher's extension of Student’s test. 



1937] 


131 


Sub-sampling eor Attributes. 

By M. S. Bartlett. 

7. Introduction. Inferences ft om Normal and Binomial Samples. 

For a measurable character an inference—the calculation, for ex¬ 
ample, of fiducial limits—may be made from a sample S l not only 
on the value of the population mean m, but also on the value to be 
expected in a second sample S 2i or, alternatively, m the entire 
“ sample ” or batch S = S ± + $ 2 from which the “ subsample ” 
S 1 was taken. Thus if a sample or batch can reasonably be regarded 
as drawn from a basic normal population, we have normal variates 

u = x x — a 2 , v = x x — x . . . . ( 1 ) 

where 7 3 is the mean of S l9 etc , and corresponding variances 

+ . ( 2) 

We may conveniently think of x x as an estimate of x 2 with a variance 
(1 + njii}) times, or as an estimate of x with a variance (1 — njn) 
times, its variance as an estimate of m. Since an estimate of ^(r) 
is available from S v inferences on x 2 or x may at once be made using 
the ^-distribution (cf. ( 1 ), ( 2 ) and (3)). An inference on the value of 
the variance of S 2 or of S may similarly be made. For fiducial 
inferences on S, it is of interest to note that an exact solution of the 
estimation problem has been made possible in this case, although 
is a sample from a finite batch or “ population.” 

For a non-measurable character or attribute A, an exact fiducial 
inference on the value of p, the population probability of A, is no 
longer possible, but an exact test of significance for any value of p 
can still be made. From it an interval can be assigned to p such that 
values of p outside the interval are significantly contradicted by the 
sample; and we might still assert that our prediction for p has a 
fiduciaf error not less than a known amount (see ( 4 )). 

At this stage it will be convenient to discuss the practical value 
of two approximations. The normal approximation to the binomial 
is well known, though I am not sure that it has always been used 
to the best advantage. To calculate approximate values for the 
limits in Example A of (4) (p. 411), where 8 mice are supposed to have 
died out of a sample of 30 after injection with a toxic drug of known 
strength, we could write for the level P = 0*05, 

■ • ■ ■ <»> 



132 


Bartlett —Rxibsampling for Attributes . 


[No. 1, 


which would give, on squaring, a quadratic equation for p whose roots 
are the limits required. As a refinement, however, we shall consider 
the observed number to be 8 J when we are calculating the upper 
limit, and 7 \ for the lower limit, as a correction for discontinuity. 

A somewhat simpler method is to use the transformation 
r t = sin-H^, since it has been pointed out (5) that the corresponding 
variate to tj has a constant variance Thus for the upper limit 
we have 

sin-V^sin-^+^l. ... (4) 

The various results obtained for this example are summarized in 
Table I. The sin - 1 method is hardly better than the direct normal 
method, but the latter involves solving a quadratic; if a better 
approximation than the sin .- 1 method is required, the mean of the 
two methods should be taken. 


Table I. 



Limits (P = 0*0,5). 

Limits (P = 0*01). 

Lower. 

Upper, 

Lower. 

Upper. 

(1) Chart (Clopper and Peaison) ... 

0*12 

0-46 

0-09 

0*52 

(2) “ Normal ” . 

0-130 

0-462 

0-106 

0-518 

{3) . 

0-114 

0-455 

0-081 

0-511 

(4) Mean of (2) and (3; . 

0*122 

1 0-459 

0-094 

0-515 

Exact values . 

0-123 

0-459 

0-093 

0-516 


While the charts in (4) are available when relevant, the value of 
these approximations should be noted. They are considered again 
in the second part of this paper. To facilitate the use of siir^V p, 
a table (computed ab initio) is appended here (Table II). 

Table II. 
y — sin^V#. 


1 

on,. , 

01 

0-2 +. 

0 3 + • 

r»4+. 

0-01 

0-02 

0-03 

0*1002 

0-1419 

0-1741 

0-3381 , 

0-3537 

0-3689 

0-4760 

0-4882 

0-3002 

0-5905 

0-6012 

0*6119 

0-6949 

0-7051 

0-7152 

0-04 

0-03 

0*06 

! 0-2014 

I 0*2255 
0*2475 

0-3835 

0*3977 

0-4115 

0-5120 

0-3236 

0-3331 

0-6225 

0-6330 

0*6435 

0-7252 

0-7353 

0-7453 

0-07 

0-08 

0*09 

! 0-2678 

1 0-2868 
| 0-3047 

0*4250 

0*4381 

0-4510 

0-5464 

0*3576 

0*5687 

0-6539 

0-6642 

0-6745 

0-7554 

0-7654 

0-7754 

0*10 

0-3218 

| 0-4636 

0*5796 

1 

0-6847 

0-7834 


For ,r > 1, y = 1-3708 — sin" 1 \^ 1 — r). 









1937] 


Bartlett —Subsampling for Attributes. 


133 


II. Subsampling for Attributes. 

Let the number of members of S 1 with the attribute A be r v etc., 
so that 

V(ri) = (w x —"t, 1 ) i T r ! ^ ~ "'* ' * • (5) 

To infer the nature of S 2 or of S = S x + S 2 , we consider 

V( r i> **/*) = p(rj) p(r 2 )/p(r) 

_ n x 1 ot 2 1 (n - r) 1 r ! ,m 

r x ! r 2 ! n ! (% - r x ) i (w a - f a )! 

This distribution merely gives the possible ways of assigning at ran¬ 
dom r members to two samples S 1 and $ 2 , but the above method of 
deriving it is convenient and of theoretical significance (see (G)). 

The variates 


u — -- v ■ 


have variances 


(compare with equations (1) and (2)), and are useful for large values 
of r x and r 2 . For pq would be written the theoretical statistic 
r(n — r)/n 2 . Equation (6) gives, however, the exact “ configuration a> 
distribution independent of p; and is available for all values of 
r X) n x and n 2 

Suppose, for example, that A is an undesirable attribute. From 
a subsample S x we require the lower limit for r in a total batch S. 
The chance of obtaining so few as r x members in S x , given r in S , 
may be written 


p_* 2 ! (ft — r)! / r.n 

(ft»-r)!w! Ir ‘ 1. («» —! 


K —r)! w! r 1 1. (« 2 — r + 1) 

4- it ~ H • w i( w i ~ 11 . 1 /q\ 

+ 1 . 2 . (n 2 - r + 1)(«, -r + 2) + ‘ ’ J • ' w 

to r x + 1 terms; or the first r x + 1 terms in the expansion of the 
hypergeometric series 

K - rji ft ! F (~ r ’~ "i» ”• -r + hl). • (10) 

For small values of r x , (9) is fairly readily evaluated for any 
value of r. For larger values of r v and ?? 2 reasonably large, it might 
sometimes be useful to notice that P is near to, but always less than, 
the first r x + 1 terms in the expanson of the binomial series 

+ • ■ ( u > 



134 


Bartlett —Subsawplipg for Attributes. 


[No. 1 


Numerical Illustrations. 


(a) In tlie problem proposed by Isserlis ( 7 ), in which a sample 
of 13 cards from 52 gives 2 “ trumps ” (number of trumps in pack 
unknown), we find 


p( Yl <£ 2 jr = 19) = 0-0636, 
p(r t <: 2/r = 20) = 0-0463, 

% 

so that we can say that the probability r ^20 is significantly con¬ 
tradicted by the subsample for a P = 0-05 (one “ tail ” only) level 
of significance. The *’ normal ’’ approximation is given by 


21 , 1-645 / fr (52 - r) 3 ) _ r 
13 + 52 V l 13 4 J 52 


- • ( 12 ) 


or t — 20 - 0 , 
is given by 


i.e. 20 to the nearest integer. The sin - 1 approximation 


in 


1-645 




V(l 3 '4) — surl '\j 52 


(13) 


or r— 19-1, i.e. 19 . The mean of the two is 19 * 6 , i.e. 20 . These 
approximations are given in this illustration merely for comparative 
purposes; though even here the sin - 1 method is useful as a starting 
point for the exact solution. 

(b) Suppose 5 lamps out of 50 have given a test-life of less than 
1800 hours. How many can be expected to give as short a life in a 
further batch of 500 ? The results are set out in this example in 
tabular form (Table III). The value of r was found as in (a), and 
then r 2 = r — 5. Besides the P = 0*05 level (one k ‘ tail " only), 
the P = 0*025 level was also considered, since the value obtained from 
(4)—from the P = 0-05 chart (for two <h tails ”), the size of n being 
neglected—could then also be obtained. 


Table III. 


1 

! P = 0-03. 

P = 0-025. 

(1) Chart (Clopper and Pearson} 

1 

circa 110 

(2) “Normal'*. 

104(104-1) ! 

116 (115-6) 

(3) Sin-V(r/it). 

99 (98-6) 

108 (107-8) 

(4) Mean of (2) and (3) . 

101 (101-4) 

112 (111-7) 

Exact values . 

101 

112 


In this last example we may contrast the information obtained 
with that obtainable by measuring the lives of all the 50 lamps, 
by noting that the information on the probit value corresponding to 
p, if we assume the underlying measurements are normally dis¬ 
tributed, is 

I = nz 2 jpq 


(14) 








1937] 


Babtleit —Subsampiutg fot Attributes 


135 


where z is the value of the normal ordinate corresponding to p (see 
(8)) The percentage mfoimation retained can thus be regarded 
as iqoII' ll; representative \alues of wl h are reproduced below 
(Table IV) 

Table IV, 










SUPPLEMENT 

TO THE 

JOURNAL OF THE ROYAL STATISTICAL SOCIETY 

Vol. IV., No. 2, 1937. 


Some Examples op Statistical Methods op Research in Agri¬ 
culture and Applied Biology. 

By M. S. Bartlett. 

(Statistician, Imperial Chemical Industries, Limited, Jealott’s 
Hill Research Station, Bracknell, Berkth 


[Read before the Industrial and Agricultural Research Section of the Royal 
Statistical Society, April loth, 1937, Dr. E. C. Snow m the Chair.] 


CONTENTS 

PAGE 

Introductory Remarks. 

. 137 

The Layouts of Some Complex Cotton Experiments 

. 138 

Partial Confounding . 

. 141 

Covariance and Salty Plots . 

. 142 

Experiments on Residual Effects . 

. 146 

Covariance and a Dairy-cow Nutrition Experiment 

. 147 

Two Further Trials , ... . 

... ' ... 154 

Sampling Errors * . 

. 155 

The Efficient Detection of Treatment Effects . 

. 159 

Further Notes on the Analysis of Heterogeneous Data 

. 166 

Concluding Remarks .. 

. 169 

Acknowledgments . 

. 169 

References . 

. 169 


Introductory Remarks . 

A fairly complete survey of general statistical methods appropriate 
to experimental problems in agriculture has by now been given to 
this Section. Nevertheless, the wide applicability of these methods, 
stressed in previous papers, implies at the same time that the held 
of application can be a very mixed one. My aim here will mainly 
be to illustrate the use of statistical technique with some examples 
drawn from experimental work of different kinds: thus I hope to 
indicate both the principles underlying these examples and the more 
particular problems that they raise. Since I intend to avail myself 
of the contents of previous papers, it seems to be a justifiable step 
to refer readers to these papers for an explanation of basic methods 
such as experimental layouts, analysis of variance and covariance, 
SUPP. VOL. IV. no. 2. g 












138 


Baktlett — Some Examples of Statistical Methods [No. 2, 


factorial treatments, confounding and so on. It certainly renders 
my own particular task easier, for the onus of explaining all these 
things is no longer mine. Reference can also always be made 4 ’0 
Fisher’s well-known books, (1) and (2). 

I remember mentioning before one interesting feature of this 
Section s activities, in enabling the value of statistical method to be 
stressed throughout the entire range of research into the economics 
of cotton production and manufacture. The last paper dealt with 
variation in the quality of finished cotton cloth. The first topi 
of the present paper will be agricultural research on cotton. 


The Layouts of Some Complex Cotton Experiments* 

In 1934 a series of large-scale field experiments was begun in 
Egypt under a Joint Agricultural Research Scheme.f Factorial 
experiments were put down on representative sites, to see whether 
by a study of the simultaneous variation of several factors in the 
same experiment, changes in cultural practice would be suggested 
which would lead to higher levels of yield and increased response tc 
fertilizer application. A major problem was the most effective use of 
nitrogenous fertilizers on cotton. 

These experiments and their results are described in a series of 
bulletins published by the Royal Agricultural Society, Egypt (see, 
for example, (4), (5) and (6)). My concern here is to note one or two 
points of statistical interest arising from the layouts and analyses of 
results, with which, through Dr. F. Crowther, Jealott’s Hill has been 
associated. The layouts have been on the whole fairly straight¬ 
forward, consisting usually of three or more complete replications, 
with no confounding of higher-order interactions, this type of con¬ 
founding not being w ‘ forced 15 into the layout if inconvenient. On 
the other hand, the familiar method of using main and sub-plots— 
which is also a type of confounding—was often used, especially if 
some treatments, such as methods of irrigation, were more con¬ 
veniently restricted to main plots. As an example, an experiment, 
at Bahtim in 1934 (see (4)) consisted of four replications of all colm 
binations of the following sets of treatments :— 


Varieties : Ashmouni 
Giza 7 
Maarad 

Spacing : Close 

Medium 

Wide 


Watering : Normal 
Heavier 

Manuring: Nil 

150 kg./feddan Nitroehalk 
300 kg./feddan Nitroehalk 


* For an explanation of the complex or factorial experiment, see (2) and (3). 
t This scheme was initiated in 1934 by the Boyal Agricultural Society of 
Egypt and Imperial Chemical Industries, Limited. In 1935 the I.G. Farben- 
industrie Aktiengesellschaft also entered the scheme. The experimental work 
has been carried out by F. Crowther (I.C.I., Ltd.), A. Tomforde (I.G.) and 
Ahmed Mahmoud (Royal Agricultural Society). 



1937] of Research in Agriculture and Applied Biology. 139 

The varieties and watering were main plot treatments, a main 
plot consisting of a strip of 9 sub-plots, which were each 105 sq. 
Metres. There were thus four randomized blocks of 6 main plots, 
each set of 9 sub-plots being randomized within the main plot. The 
analysis of variance of this experiment will have two parts, corre¬ 
sponding to the main and sub-plots, each with its own error term 
(cf. (3), p. 199). This is not reproduced here; but the error terms 
may be quoted. 

Table I. 


Bahtim Cotton Experiment, 1934. Analysis of variance, error terms 
(yields in hantars/feddan). 


Eiror Oul> plot Basis). 

D.r. 

Sum of Squares. 

Variance. * 

Main plot . 

15 

59*7715 

3*9848 

Sub-plot . 

144 

34*9245 

0*2425 

Total . 

159 

94*6960 

0*5956 


From the above table the relative efficiency of the experiment for 
the main and sub-plot treatments is at once estimated from the ratio 
of the two variances. This ratio is 16 * 4 , indicating not only that the 
sub-plot treatments are much more accurately determined, but that 
for this experiment the ratio is even greater than the relative number 
9 of sub- to main plots. Following Yates ((3), p. 200), we may deter¬ 
mine further the gain or loss of efficiency if no division into main and 
sub-plots had been made. This may be done by determining the 
ratio of each variance to the mean variance, giving 6*7 for the main 
plot treatments, and 0*41 for the sub-plot treatments. (Yates 
considered the effect of replacing each treatment mean square by the 
appropriate error variance; this will be found to be equivalent to the 
above method, owing to the proportionality in this case of the degrees 
of freedom for treatments and error for the two parts of the full analy¬ 
sts of variance.) 

These ratios are recorded in Table II for the Egyptian cotton 
experiments of the series which have been of the main plot, sub-plot 

i t7pe ‘ 

Usually complete randomization would have reduced the informa¬ 
tion on the sub-plot treatments to round about 85 per cent, of the 
ictual value. Partly on the basis of such results, two of the 1936 
‘xperiments were completely randomized, each consisting of three 
docks of 72 plots. The percentage standard errors for these two 

* The enor item*? given in this column may perhaps fairly conveniently be 
Tmed variances,’’’ but in succeeding and fuller analysis of variance tables 
.y. Table XIV) the original heading “ variance ” has been altered to “ mean 
uare bince the meeting (see discussion following this paper). 






140 


Baetlett —Some Examples of Statistical Methods [No, 2 , 


experiments were 8*9 and 12-0 per cent., tlxese figures being reason¬ 
able, and comparing very favourably with the percentage sub-plot 
errors for other large-scale experiments of the same year. 

Table II. 

Efficiency Ratios . 

fr = relative efficiency of layout for sub-plot to main plot treatments (number of sub plots 
J per mam plot given m parentheses.). 

j — relative efficiency of complete randomization to actual layout for mam plot treatments. 

\ r t — relative efficiency of complete randomization to actual layout for sub-plot treatments. 


1934. 

j 1935. 

1936. 

Site. | 

r, ] 

| 

r,. 

Site. 

1 r. 

i 

Tm- j 

i 

r,. \ 

Site. 

r. 

r m . 

r* 

Bahtim .. 

1C 4(9) 

I 6 7 

0 ill 

Bahtim 

. 1 3*5(9) 

2 9l 0 S3 

Bahtim * 

J 

8 3(6) 

39 

0 47 

Cremmeua . 

3 0(9) 

•2 G 

Obi' 

TuLh 

. . 3 3(12) 

| 2 Si 0 SC 

Shaba {• 


3 9(8) 

30 

0 76 



t 


Qorashia 

13(1*2) 

3 h 0 Ml 

Xhrahimia f 


12(h) 

12 

0 98 



1 


Sakha 

..J 1*5(12) 

1 5! 

0 9f> 


; 






J 

1 

Abu Ilanrmiad 21 9(32) 

8 9 

0 40 

i 






* In tins experiment the nature of the treatments nr cessitated two successive divisions of tho 
mam pilots, dates of sow mg and watering treitmentb being nnin plot treatments, cultivation 
treatments bpmg given to halves of the mam plots, and nitrogen tieatments to the sub-plots. 
The &ub-plot and semi mam plot variances, happened, however, to be almost identical, and were 
pooled for the purpose of the above Tahle. 
f After correction for salt (see p. 142). 

It should, however, be noted that each year had one experiment 
in which it is estimated that over half the information on the sub-plot 
treatments would have been lost by complete randomization. At 
Babtim in 1936 this is partly due to the number of main plots being 
fairly large ( 36 , 6 sub-plots per main plot), but the variability of the 
main plots at Abu Hammad in 1935 was large, due mainly to a 
pocket ” of low yield in one part of the site. In the 1934 Bahtim 
experiment an examination of the distribution of plot yields suggested 
that less contrast between main plot and sub-plot variation might 
have been obtained if the blocks and main plots could have been 
more compact in shape. 

There is evidently a certain risk with complete randomization 
(if practicable) of gaining considerably on a set of comparatively few ■ 
main plot treatments by losing somewhat heavily on the sub-plot 
treatments. It would therefore seem advisable, before complete 
randomisation were adopted in any experiment of this kind, to ensure! 
as far as possible (as was done for the two actual experiments cited) 
that the site chosen appeared particularly suitable. 

A possibility with main and sub-plot layouts should be remem¬ 
bered. Apparent interactions of sub-plot treatments with main plot 
treatments might be due to main plot position, and not to main plotf 
treatment effects. An examination in certain cases of the error 
variance has not, however, revealed any evidence of heterogeneity 
associated with this kind of effect; this tends to bear out Yate$ 
findings ((3), p. 220). 





1937] of Research in Agriculture mid Applied Biology. 141 


An exceptional difference in variability between blocks in the 
1936 Ibrahimia experiment tended, however (in spite of the known 
insensitiveness of the z test to differences in block variance), to throw 
some doubt on the validity and value of the orthodox analysis for this 
centre, which was rather anomalous in this respect * 


Partial Confounding . 

Partial confounding of interactions was adopted in one 1936 
experiment, and in two 1935 experiments of a subsidiary group. 
At Biba in 1935 an experiment consisted of four replications of a 
3 X 3 X 3 set of treatments (nitrogen, phosphate and spacing), in 
which two different degrees of freedom of the 8 for the second-order 
interaction were confounded in each replication, there being 12 
blocks of 9 plots in each (for layout, see Fig. 1). By this arrange¬ 
ment the second-order interaction is only confounded to an extent 
of 25 per cent, (see ( 2 ) p. 135). Neglecting the effect of confounded 
components of the interactions, we may estimate by Yates 5 method 
((3), p. 216) the gain by confounding. In this instance a relative 
efficiency of 155 per cent, was obtained, and hence 116 per cent, for 
the partially confounded second-order interaction. There has thus 
been an estimated gain in information even for the partially con¬ 
founded effects. 

For the other 1935 experiment, at Abu Qir, the third-order inter¬ 
action was completely confounded in a 2 X 2 X 2 X 2 set of treat¬ 
ments (variety, spacing, nitrogen and phosphate), three replications 

* It seems worth noting here that while a corresponding increase in varia¬ 
bility with mean yield can be met by a change in scale, a situation can con¬ 
ceivably occur in which the mean yields and mean treatment effects may be 
roughly constant from block to block, but soil patchiness in one block seriously 
affect the accuracy of the whole experiment. The necessity for choosing as 
uniform an experimental site as possible is well recognized, but it may be less 
obvious that the result of adding further replications with the intention of 
increasing accuracy might reduce it if greater heterogeneity is consequently 
introduced. 

For, if purely as the result of such patchiness, the n block variances oy a are 
different, the information on a treatment effect will on the above assumptions 
be proportional to 

T= v_L_ JL 

X = —' O - 1)9 

1 <V *A“ 

where a* 3 is the harmonic mean varianco. 

The information used in the orthodox analysis will vary as 



where o a z is the arithmetic mean variance. While I' will in most cases be little 
less than I, the difference may be considerable in an exceptional case. If the 
most variable block variance a r 2 had never been introduced, the information 
retained will of course always be somewhat less than T, but the new value of 
I' may not be less, and may*even be greater than I'. 



142 


Bartlett —Some Examples of Statistical Methods [No. 2 , 


of the 16 treatments being contained in 6 blocks of 8 (Fig. 2 ) * 
■While the estimated efficiency ratio for this experiment only reached 
108 per cent., these two layouts are both very attractive examples of 
confounding. 

In the 1936 experiment at Beni Suef, the treatments were all 
combinations of 

Nitrogen (four levels, N 0 , N ls N 2 , N 3 ), 

Spacing (Close C, medium M, and wide W), 

Phosphate (P 0 and PJ, 

Potash (Kq and K x ); 

48 treatments replicated three times, making 144 plots. If we denote 
the three independent nitrogen comparisons by 

I ^1 ^ ^0 “f“ N"1 No N 3 
-1? 2 = N 0 + N 2 — N, — N 3 
t ?? 3 = N 0 + N s — Nj — N 2 

then the second-order interactions 

«jXPxK 
«2 X P X K 
« 3 X P X K 

were each confounded for a separate replication, making 6 blocks of 
24 plots each. The contents of the 6 blocks may be represented 
symbolically by the 6 sets of 24 :— 

(C + M+ W) (P 0 X 0 + P^O- 

For this experiment the efficiency ratio for the unconfounded 
effects was 11 S per cent., and since N X P X K is only confounded 
to the extent of 79 per cent, for this interaction. 

Covariance and Salty Plots. 

In Table II it was noted that the figures for the Shaba and Ibra- 
himia 1936 experiments were those obtained after correction for salt. 
The problem of salt accumulation in the upper layers of the soil can 

* This is the obvious and practical method of confounding a 2 X 2 X 2 X 2 
experiment. For four factors P, Q, R, S, the following system of blocks of 6, 

O, PQRS; PS, QR; QS, PR. S, PQR; P, QRS; Q, PRS. 

PQ, RS; 0, PQRS; PS, QR. R,PQS; S, PQR; P, QRS. 

QS, PR; PQ, RS; 0, PQRS. Q, PRS; R, PQS; S, PQR. 

PS, QR; QS, PR; PQ, RS. P, QRS; Q, PRS; R, PQS. 

confounds, besides the third-order interaction completely, each first-order 
interaction to the extent of \. This system is not, as far as 1 am aware , of 
any practical value, but it may be worth recording as an interesting mixture 
of ordinary confounding and Yates 5 method of incomplete blocks. 


<N« -r Nx) 
(N. - 1 - N a ) 

<N„ T Ns) _ 
<N, f N,) 
<N a 4 N a ) 
.(N x - 


I (Na + N s ) 
I (N 0 4 N,) 

(C 4- il 4 W) (P x X 0 4 P.KO- ± ^ 

(k; 4 N^ 
V(N 0 4 N 8 ) 




1937] of Resecnch in Agnculttue and Applied Biology . 143 


be a very serious one in Egypt, owing to the detrimental effect of the 
salt on the growth and condition of the crop. In two experiments 
(Beni Suef and Shaba) the effect was fairly senous; m another 



Fig. 1.*—Lvyout toe Thbee-ehctob Evpebiment on Cotton at Biba, 
Egypt, 1935. 

Spacing C, M, W. 

Nitrogen 0, 1, 2. 

Phosphate 0, 1, 2. 

* The blocks were finally arranged for convenience m a 6 X 2 formation, 
but this does not affect the principle of the layout. 
















144 


Bartlett— tionte Examples of Statistical Methods [No. 2, 


Ai 

g 2 

Gx 

N 

£ 

P 

Ax 

G 2 


A 2 

Gx 

G 2 

Gx 

Ag 

N 




X 



P 

P 



P 

Ax 

Gi 

a 2 

Ax 

g 2 

A a 

N 




N 

N 

P 

P 

P 


P 

P 

Ag 

g 2 

g 2 

Ag 

Gx 

A 2 

N 

N 

N 

N 


N 


P 

P 


P 


■A* 

G 2 

Gx 

G 2 

Ax 

A 2 


N 

N 


N 


P 


P 

p 



Gi 

Ax 

Ax 

Ai 

g 2 

Ag 


N 


N 

N 

N 



P 



P 

g 2 

A 2 

Gx 

Ag 

g 2 

Gx 


N 





P 

P 



P 


A 3 

G x 

A 2 

g 2 

Gx 

Ax 


N 

N 

N 

N 



P 

P 


P 

P 


Fig. 2.—Layout foe Foub-factor Experiment on Cotton at Abu Qm, 
Egypt, 1935. 

Varieties A, G. 

Nitrogen 0, 1. 

Phosphate 0, 1. 

Spacing 1, 2 (suffix). 

(Ibrahimia) it was apparently of less importance.* Visual estimates 
of tlie degree of saltiness of affected plots tad, however, been made 
(e.g. on a scale o-io). The scale of suet salt measures was a priori 
ratter arbitrary, but graphs indicated that a linear dependence of 
yield on this salt measure fairly adequately summarized the 
variation. There was no reason to suppose that the salt estimates 
depended on the treatments (actually or psychologically), and it was 
decided to correct as far as possible for the effect by means of an 
analysis of covariance (see (7) p. 41, and (1) p. 257). The internal 
evidence of the amount of variation in the salt index formally 
ascribed to treatments in the analysis of variance supported the 
assumption of no apparent treatment influence. 

Part of the analysis of variance and covariance (the separate 

* The variability in this experiment was presumably due to other causes. 




1937] of Research in Agriculture and Applied Biology . 145 


components of the treatment variation are not shown) of the 8 haba 
experiment is given in Table III. 

Table III. 

Shaba Expe) imeni—Analysis of Variance and Covariance . 


x = Salt estimate (scale 0-10). 
y — Yield (kantars/feddan). 



D.r. 

X s . 

xy. 

2/*. 

Main plot treatments ... 

5 

31*20 

- 29*048 

32*2104 

Blocks. 

2 

152*18 

-129*899 

112*7495 

Main plot error. 

10 

128*49 

- 78*005 

97*8071 

Main plot total. 

17 

311*87 

-236*952 

247*7670 

Rub-plot treatments 

42 

120-79 

- 90*606 

115*6551 

Sub-plot error . 

84 

279-33 

-191*835 

240*7536 

Grand total 

1 

143 

I 711*99 

-519*393 

604*1757 


The method of adjustment used was to calculate the regression 
coefficient from the sub-plot analysis, and then use this coefficient 
for the main plot treatments. The adjusted set of treatment yields 
then comprised one single table, whereas if a separate coefficient had 
been calculated from the main plot analysis, the rather anomalous 
(though of course quite valid) position would be reached in which the 
individual treatment yields did not add up to the main plot totals. 
This method of adjustment will be quite efficient provided the regres¬ 
sion coefficients obtained from the two sections of the analysis of 
variance are comparable. The adjusted error variance for the main 
plot analysis is here actually a little less than the value that would 
have been obtained by a straightforward adjustment. 

The corrected sums of squares for the main plot analysis (including 
the error term) are now exactly given by the equation 

s yr 2 = S {y-bxf 

= S y 2 - 2bSxy + 6 2 Sx 2 , 

since b is quite independent of the main plot analysis. For the sub¬ 
plot section of the analysis it is known that such an adjustment 
does not give the exact theoretical terms for tests of sig nifi cance 
(see, for example, ( 1 ) p. 266); but it was considered sufficient to 
make the same adjustment for the components of the sub-plot 
treatments, the error of b thus being neglected. Owing to the 
small number of degrees of freedom for each component compared 
with the sub-plot error term, this approximation could safely be 
assumed to be satisfactory. The adjusted analysis of variance 
values corresponding to Table III are given in Table IV. 

g2 




146 


Bartlett — Some Examples of Statistical Methods [No. 2, 


Table IT. 


Shaba Experiment—Analysis of Adjusted Variance. 


D.E. 

b. 


Main plot treatments 

i 5 


12-027 

Blocks. 

2 


6-104 

Main plot error . 

10 

(-0*60709) 

51-266 

Main plot total . 

17 


69-397 

Sub-plot treatments . 

42 


48-175 

Sub-plot error . 

83 

-0*68677 

109*007 

Grand total . 

142 


226 579 


While the covariance method probably gave as good an adjustment 
for the effect of salt as is possible, the meaning of the modified treat¬ 
ment yields should be clearly realized. It would perhaps be less 
ambiguous to say that the yields have been equalized for salt rather 
than corrected for salt (their mean remaining unaltered at the actual 
mean level obtained over the experimental site, for all plots, salty or 
non-salty). Since in severe cases the salt damage might be sufficient 
to check the extent to which the crop can respond to treatments such 
as extra nitrogen, the experimenter might complain that such equal¬ 
ized yields, which will estimate, among other things, the average 
nitrogen response over all the plots, may not tell him everything 
he wants to know. An auxiliary investigation could, however, be 
undertaken (e.g. by estimating the yields of plots (see (8)) which 
were very severely damaged) to see how far such a check on the 
response in the absence of salt appeared to be operating. 

Experiments on Residual Effects. 

Before we <k leave Egypt,” it is worth considering an example of a 
type of experiment in which an existing experimental site is examined 
for residual treatment effects. Often the value of an investigation 
of this kind will lie not only in evaluating the residual effects of 
fertilizers, but also in comparing their importance with the direct 
effects of fertilizers re-applied. Since the combination of direct and 
residual treatments will probably imply a greater number of treat¬ 
ments than plots in the original blocks of the layout, confounding 
may inevitably have to be considered in an experiment of this kind. 
Even if the experiment for the second year was planned at the same 
time as the original experiment, the numbers of plots per block in 
the first year are hardly likely to be more than necessary, though 
excessive initial confounding is not of course advisable. 

In a wheat experiment (9) in 1934, three factors were present, the 





1937] of Research in Agiiculture and Applied Biology . 


147 


treatments being all combinations of three varieties, three spaeings 
and three levels of nitrogen. No confounding was adopted, and 
there were four replications of 27 plots. In 1935 it was required to 
examine the residual nitrogen and spacing effects in conjunction with 
further similar nitrogen and spacing treatments on maize sown on the 
same experimental site. 

We have therefore two separate sets of treatments, one set already 
present in the plots, the other to be superimposed. The layout 
adopted for the 1935 experiment consisted of three blocks represented 
diagrammatically as follows :— 

Residual Whe it. Maize. Complete Scheme. 



S 0 . 

S x . 

P* 


s 0 . 

S*. 

S 2 . 


X. 

T. 

z. 

No 

X 

Y 

z 

N 0 1 

A 

B 

c 

A 

I 

II 

Ill 

Nj 

z 

X 

Y 

n x ; 

C 

A 

B 

B 

III 

I 

II 

N 2 

Y 

Z 

X 

n 2 1 

B 

C 

A 

0 

II 

HI 

I 


This Latin square representation will be familiar from its use for 
the 3x3x3 layout, and is simply a method of grouping the treat¬ 
ments. Thus X represents the set of residual effects N 0 S 0 , N 1 S X , and 
N 2 S 2 , A similarly for the direct effects. If now the set of 27 treat¬ 
ments, AX, BY and CZ, is put in block I, and so on, we have a layout 
of 81 treatments, of which two degrees of freedom of the third-order 
interaction are confounded with blocks. If slight residual effects 
of varieties are present, this will not, of course, affect the validity of 
our comparisons, for naturally the assignment of the three particular 
direct treatments to be given to the three plots in a block with a 
particular residual nitrogen and spacing will be made at random 
(for 1935 layout, see Fig. 3). 

Since there is now no actual replication, the error terms decided 
on in the analysis of variance were the remaining third-order inter¬ 
actions (14 d.f.), and the second-order interactions which involved a 
first-order residual interaction (16 d.f.). Actually in this experiment 
the direct treatment effects on the maize crop proved so large that 
any residual effects were negligible in comparison, though a significant 
nitrogen, and a just significant spacing, residual effect were obtained. 

Covariance and a Dairy-cow Nutrition Experiment. 

In the adjustment of the yields from salty plots in Egypt, the use 
of covariance was an auxiliary device successfully introduced to 
improve the accuracy of the experiment. It may thus be contrasted 
with the deliberate intention of using covariance when an experiment 
is designed. Since animal nutrition experiments have been little 
discussed at meetings of this Section, I have made use of some experi¬ 
mental data on the winter feeding of dairy cows, to illustrate this use 
of the covariance method. 



148 


Bartlett —Some Examples of Statistical Methods [No. 2, 


W1 I C 0 

M 0 | C 0 

MO ' M2 | MO 

C 2 0 1 1 W1 

C 1 i W2 
W 0 | 11 1 

I 

C 0 
W 2 

C 2 

C 2 

C 2 
W1 

r o 

M 1 

IF 2 

C 0 

IF 1 

C 2 

Cl i IF0 
(J 1 Cl 

210 

M 0 

211 

C 0 

IF 1 
Wl 

212 

M 2 

212 

WO 

IF 2 
W 2 

211 

W 2 

211 

M 1 

WO 

wo 

IF 0 

M 2 

C 2 

M 0 

C 1 

M 2 

IF 2 

C 1 

W 1 

M 1 

C 0 
WO 

C 2 
W 2 

211 

C 1 

210 
W 2 

wo 

C 2 

C 1 
Wl 

212 

M 0 

211 

M 2 

a 2 

C 0 

C 2 

M 1 

C 0 

M 2 

212 

C 2 

wo 

Wl 

IF 2 
WO 

Wl 

C 0 

IF 1 

W 2 

C 1 

C 2 

212 

Wl 

211 
WO 

IF 2 

M 2 

WO 

M 0 

C 1 

M 0 

210 

C 0 

C 0 \ 
C 1 

210 

M 1 

C 2 

C 1 

wo 

M 1 

IF 1 

C 1 

211 

M 0 

210 

M 2 

212 

M 1 

C 2 
WO 

j 

1 

1 


C 1 

M 1 

C 0 

M 0 

C 1 
W2 

212 

C 0 

212 
W 2 

211 
| C 2 

C 0 

C 2 

W 1 ' 
WO | 

211 \ 
Wl 

IF 1 

M 2 

C 1 

C 0 

210 

C 1 

r 2 

21 2 

210 
WO , 

irs 

Wl 

IF 2 

C 2 

C 0 
Wl 

IF 2 

M 0 

IF 0 | 
W 2 ! 

IF 0 

C 0 



Fig. 3.—Layout fob Four-factor Experiment on Maize vfter Wheat, 
Egypt, 1935. 

Wheat residual treatments (italics): Spacing, C, 21, IT. 

Nitrogen, 0, 1, 2. 

Maize treatments: Spacing, C, M, W. 

Nitrogen, 0,1, 2. 

First of all, it is observed that the individuality of the milk yield 
makes it natural to look for a method in which a cow acts in some way 
as fie its own control.” One familiar method has been to choose 
animals of suitable lactation period, and alternate the treatments fed. 
Thus an experiment might be designed by giving one cow chosen at 
random from a pair the treatments A, B, A, B over four successive 
periods of five weeks, and the other, B, A, B. A. The two cows 
would then constitute one replication for the comparison of A and B, 
the grouping of the two cows eliminating any possible bias due to 
the systematic order of the treatments A, B (for the description 
of an experiment of this kind, see Watson and Ferguson (10)). 

The standard error for such an experiment is likely to be small, 
so that if the cows, whose feeding is strictly rationed, react sharply 
to a change in the level of nutrition, this will probably be detected. 

Nevertheless, such an experiment has several limitations. Caution 
is necessary in any investigation which requires the application of 
treatments to experimental " units ” already treated and possibly 




1937] of Research in Agriculture and Applied Biology. 149 


affected by previous treatment. In the experiment referred to 
(10), it was considered advisable to examine the possible changes in 
slope of the lactation curve, as well as possible changes in mean level. 
Even if the milk yield itself is assumed to react sharply, other factors 
studied at the same time may show greater inertia, and the ultimate 
effect of a treatment be obscured. A ration should be allowed to 
exert its effect over as long a time as possible, before the value of the 
ration can be considered finally assessed. 

The change-over experiment, since no treatment difference was 
detected, provided a useful set of uniformity data (see (10), p. 210), 
on the basis of which it was pointed out (11) that a long-term experi¬ 
ment in which the milk yields were corrected by covariance with the 
initial yields, while not of course giving such a low standard error 
as the change-over type, gave a standard error of the order of 8 per 
cent, per cow, which is a figure sufficiently low to make such long¬ 
term trials worth considering. This result is stressed, since some 
experimenters have tended to adopt the change-over system without 
very much consideration of the relative merits of the two types of 
experiments. 

In December 1934 a winter feeding trial (see Watson and Ferguson 
(12)) was begun at Jealott’s Hill, in which twenty cows were all fed 
on the same ration for a control period of three weeks, and then for a 
further period of seventeen weeks on four different treatments :— 

Ordinary w inter ration of roots, hay and concentrates. 

Artificially dried grass. 

Low -temperature silage made with added molasses. 

A.I.V. fodder. 

(for full details of the experimental rations, see (12)). The division 
of the cows into five groups or Ck blocks ” was made partly on age and 
partly on the date of calving. Treatments and shed position were 
allotted to cows of the same block at random. Unfortunately the 
difficulty of obtaining cows calving at the appropriate period led 
to the inclusion in the first block of cows which ran dry before the 
experiment was finished, so that for the full period only four blocks 
were available. Since in addition one cow (on molassed silage) in 
the fourth block developed a hard quarter, and had to be rejected, 
the natural hazards of the experiment become evident; and the 
twenty original animals -were about the minimum number (allowing 
for such hazards) on which an experiment with four treatments could 
have been run. 

The weekly variation in milk yield is illustrated in Fig. 4, which 
gives the yields for the fourth block, including those for the rejected 
cow. The average milk yields for the control period and experi¬ 
mental period proper are reproduced in Table V. In this Table the 



150 


Bari Lett —Home Examples of Htatistical Methods [No. 2, 



Fig. 4.—Milk Yields for Cows of L\st “Block” (Dotted 
CUR\E IS FOR THE REJECTED Cow). 


yields for the rejected cow are added in parentheses, as it will be 
interesting to see how far the data by themselves suggest their 
rejection. 


Table V. 

Dairy-cow Winter Feeding Trial , 1934. 


Mean Yield of Milk in lb ., week for Control and Experimental Periods. 


Block 

E. ' 

c. 

a 

E. 

Treatment: 

(1) Control 

. 

! 279 

245 

197 

250 

| 20S ! 

212 

157 

208 

(2) Dried Grass 

! 

. 223 

224 

179 

232 

i 172 | 

1C4 

! 

156 

166 

(3) Molassed Silage 

269 

236 

191 

(311) 


183 , 

204 

136 

(192) 

(4) A.I.V. Fodder 

1 342 

l 208 

1 210 

274 


247 

150 

159 

210 


In the original analysis of these figures ((12), p. 354), the yields 
for the c * missing 57 cow (E. 3) were estimated by the usual missing 
plot technique to be 257 and i 88 , and an analysis of variance and 
covariance then carried out, in order to obtain a corrected analysis 
of the experimental yields. Quite an efficient analysis could have 





1937] of Research in Agriculture and Applied Biology . 151 


been made on the straightforward differences between initial and 
experimental yields, but the advantage of the covariance analysis 
in place of such an arbitrary correction is not only that it is certain 
to make the most efficient (linear) correction possible—a considera¬ 
tion which with limited data is of some importance—but does this in 
a standard way for all other characteristics (percentage fat, live 
weight, etc.) in which we may be interested, for which analyses in 
terms of differences might all have different efficiencies. 

The theoretical effect of the missing cow was ignored in the 
original analysis. Alternatively the block effect (which was insigni¬ 
ficant) might have been ignored. Any more exact analysis is hardly 
ever necessary, but it may be of value to illustrate below one possible 
form that the complete analysis of variance could take. A formal 
way of estimating a missing figure is to assign a corresponding value i 
to a pseudo-variate :r 0 , and o for the remaining values; and carry 
out the operations of covariance. This must of course be theoretic¬ 
ally identical with Yates' method (8) of obtaining missing values and 
the corresponding exact tests, but for those who have become familiar 
with the use of covariance, it may sometimes be helpful to realize 
this* 

Table VI. 

Dairy-coiv Winter Feeding Trial , 1934. 

Analysis of variance and covariance (including .r 0 ). 

ar 2 , experimental milk yield. 
x l9 initial yield. 
z 0 , pseudo-variate. 



D.F. 

V- 


•Va- 



V- 

Blocks . 

3 

0*1875 

23*62 

11-25 

17,113 

9,859 

5,849 

Treatments 

3 

0-1875 

13-62 

-4-00 ! 

4,967 

2,389 

2,432 

Error . 

9 

0-5625 

30-63 

2-00 

7,612 

5,026 

5,206 

Total 

13 

0-9375 

67-87 

9-25 

29,692 

17,274 

13,487 


From the above table we obtain the regression coefficient of x x on 
.r 0 as 

x _ 30*63 __ 

hl ° ~ 0 « ~ 5445 

Similarly b 20 = 3-56. 

* If, for example, a plot were missing in a 3 X 3 X 3 27-plot layout, its 
value could be estimated as 

y — bx 0 = — 6, 

since // -a 0 and a* 0 = 1. (No extra term bx ti is added, since we are not interested 
in retaining the mean level of the experiment at that obtained by including a 
zero or anomalous value for the plot in question.) 






152 Bartlett —Some Examples of Statistical Methods [No. 2, 

The estimated values for E. 3 are thus 

311 - 54*45 (1) = 257, 

192 - 3*56 (1) = 188, 

to the nearest integer, as originally obtained. 

The adjusted error term for a* 2 , ignoring x 0i is obtained from the 
error line by calculating 

Sr 2l 2 = S (x 2 -b 21 x x )* 

= &r 2 2 - (S xjBjflSzf 
= 5206 - (5026) 2 /(7612) 

= 1887 (8 d.f.). 

The adjusted error term, correcting for x 0 , is 

Sjfg io“ ==: 8 .r 2 q“ — (br 2 0 ) 2 / baq 0 “, 
where 8 .r 20 2 , etc., are calculated similarly to S.r 2 x 2 , 

= 1131 (7 d.f.) 

The difference 756 (1 d.f.) shows the reduction in the error sum of 
squares due to estimating the yield for the rejected cow. 

For the exact test of significance of treatment differences we have 
the table (Table VII) 


Table VII. 



D.r. 

■* 1 0 S> ^ 


i 

D.F. 

**8 10** 

Treatments. 

3 




3 

809 

Error . 

8 

5,944 i 

4,917 

5,199 

7 

1,131 

Total. 

11 

| 9,968 

7,533 

7,633 

10 

1,940 


where the entries in the “ total M line are calculated from the sums of 
the entries for " treatments ” and 44 error ” in Table VI, and the 
final sum of squares 809 to be tested against 1,131 is obtained by 
subtraction from the 4 ‘ total 5 ' figure 1,940. The corresponding value 
of z is 0*256, which may be compared with the insignificant original 
value of z obtained, 0*301. The error variance is equivalent to a 
standard error per cow of 7*0 per cent. 

The only significant effect obtained in the experiment was in 
percentage solids-not-fat, for which the dried grass treatment 
appeared to give a higher adjusted value than the A.I.V. fodder. In 
assessing the significance of this result, it has to be borne in mind 
that more than one characteristic was statistically analysed; but the 
result is nevertheless of value as an indication for further research. 
These solids-not-fat percentages are included in Table VIII, for a 






1937] of Research in Agriculture and Applied Biology . 153 


Table VIII. 

Dairy-cow Winter Feeding Trial , 1934. 


Mean percentage solids-not-fat content of milk for control and experimental 
periods (unweighted average for morning and evening milk). 


Blocks:— 

B. 

C. 

1 D, 

E. 

Treatment: 

(1) Control 

8*83 

8*62 

8*77 

8*63 

8*83 

8*75 

8*69 i 

8*37 

(2) Dried Grass 

8*73 

(7*42) 

8*61 s 

8*31 

8*98 

(7-40) 

8*71 | 

8*40 

(3) Molassed Silage 

8*47 

8*26 | 

8*72 1 

_ 

8-39 

! 8*12 

8*76 

— 

(4) A.I.V. Fodder 

8*33 

8-6S 

8-68 

8*67 

8*31 

8*03 

8-29 

8*41 


student might find it instructive to ask himself the following 
questions:— 

(i) The value of 2 , obtained by estimating values for the miss¬ 
ing cow E. 3, and then carrying out a straightforward analysis of 
variance and covariance, was 0*809 (for P = 0*05, z — 0*735). 
How far would this value be reduced by allowing exactly for the 
missing cow ? 

(ii) C. 2 represents a cow suspected of being tuberculous. 
Its solids-not-fat figures are evidently anomalously low. What 
internal evidence is afforded by the data that the analysis was made 
more inaccurate by the retention of these figures ? 

(iii) If C. 2 had also been rejected, what is the exact value 
of z for testing the significance of the treatment differences ? 

It may be anticipated, with regard to (i), that the value of z would 
be very little reduced, since the chief disturbance in the test is the loss 
of information due to the number of cows on molassed silage being 
reduced, and since from the table of adjusted means (Table IX) it is 
seen that the value of z arises chiefly from the contrast:—Dried 
grass v. A.I.Y. fodder, its value would consequently be little affected. 


Table IX. 

Dairy-cow Winter Feeding Trial , 1934. 
Summary of adjusted solids-not-fat %. 


i 

1 < 'outrol. 

Dried 

tirass. 

Molassed | 
talage. 

A.I.V. 

Dodder. 

Mean. 

i 

j S.E. 

Solids-not-fat % ... 

8*47 

8*07 

| 8*44 : 

8*32 

8*48 j 

1 

0*059 






154 Bartlett —Some Examples of Statistical Methods [No. 2, 

Tivo Eurther Trials. 

The long-term experiment referred to above was carried out with 
strict randomization along the lines recognized in the technique 
of field experiments. The common-sense method of correcting the 
yields of cows by means of initial yields obtained before feeding par¬ 
ticular treatments need not, however, be ignored when the evidence of 
other available records is being assessed. The lactation curve, given 
the initial yield, seems fairly stable, at any rate for rationed winter 
feeding. In the above example no extra advantage was gained by 
division into blocks. 

In a former trial (13) at Jealott’s Hill carried out in the winter of 
1931-32, four groups of originally four cows each were fed respectively 
on (I) a normal winter ration of roots, hay and concentrates, (II) a 
ration including grass silage, (III) and (IV) rations including nitrogen- 
and non-nitrogen-treated types of artificially dried grass. 

The four groups were in this trial primarily regarded as four 
“ units/’ and while the main object of the experiment was a study of 
changes in the composition of the milk and butter (such as vitamin A 
or carotene contents), bulk samples only were taken for each group. 
Records of the individual weekly milk yields for the fourteen cows 
remaining in the experiment may, however, be used to verify that no 
significant differences in milk yield were obtained. Since all the 
cows were fed with a control ration from November 19th, 1931, to 
December 12th, before being transferred to their experimental 
rations (until March 17th, when they were again all brought on to 
the control ration), a straightforward analysis of variance and covari¬ 
ance, adjusting the average experimental yields by means of the 
yields for the initial control period, can be made. The summary 
of adjusted yields is given in Table X. 


Table X. 

Dairy-cow Winter Feeding Trial , 1930. 
Summary of adjusted milk yields. 


i 

Average Milk Yield. 

l 

L 

n. 

. 

in. 

IV. 

Mean. 

1 S.E. per 
Gov, * 

Lb./week/cow 

Number of cows 

191-8 

3 

194*2 

3 

177-6 

4 

178*4 

4 

184-4 

17-86 

(9-7%) 


* Standard error of mean of 3 cows is 17*86/^ '3, etc. 


As an example of the analysis of an outside trial—out of a number 
of cows, each fed with one of three different types of ration, cows were 
selected from the subsequent records provided they did not dry off 
during the experimental period, and also gave a uniform control 







1937] of Research in Agriculture and Applied Biology . 


155 


period for tliree weeks prior to the change in ration. The milk yields 
during this initial period are used to correct the experimental yields. 
Again the adjusted mean yields are not significantly different (Table 
XI). 


Table XI. 

Outside Trial. 

Summary of adjusted milk yields. 


Average Milk Yield (4 
months). 

Sil 

Dried 

Lucerne. 

Control. 

Mean. 

S.D. per 

Cow. 

Lb./week/cow. 

Number of co^ s 

220*3 

15 

228*0 

10 

217*4 

6 

222*2 

11-3 (5-1%) 


For the four trials mentioned here and one other outside winter 
trial, the average standard error for a long-term experiment is about 
7 1 per cent, per cow. If any future trial of the same kind is being 
considered, this figure might be useful when the number of cows neces¬ 
sary to obtain any required order of accuracy has to be determined. 
For summer feeding, if the cows have free access to pasture, the cor¬ 
responding lack of control of rationing will probably imply a some¬ 
what higher standard error. 

If a co-operative experiment at more than one centre were carried 
out, the different centres could be regarded as different blocks, but 
at tlie same centre the results above indicate that division into blocks 
may be less important. This should be borne in mind when any more 
complex experiment is contemplated, an increase in block size not 
necessarily leading to an increase in error. Since some doubt seems 
to exist on the most efficient plane of nutrition on which a cow should 
be fed, one interesting complex experiment would appear to be to 
feed different treatments, but with basal rations making up the total 
ration to, say, two nutritional levels, so that the efficiency of these 
different levels could be investigated at the same time. 

Sampling Errors . 

The very wide and sometimes difficult problem of accurate 
sampling is well appreciated by all members of this Section. I have 
no intention of trying to discuss the problem generally, but since 
it is difficult when dealing with the use of statistical method to 
avoid all reference to the subject, I have included below one ele¬ 
mentary but interesting example (see Watson and Ferguson (14)), 
which helps to stress the assistance which some form of the analysis 
of variance always gives in disentangling sources of variation. 

The contents of each of two circular wooden pit silos, 9 ft. 6 in. in 
diameter, were being carted away in loads, and the opportunity was 




156 Bartlett —Some Examples of Statistical Methods [No. 2, 

taken of checking the accuracy of the usual sampling method. For 
several loads the usual practice of removing a small portion of each 
forkful to form one representative sample of about 2-3 lb. from each 
load for laboratory examination was replaced by the practice of 
sampling in triplicate with three different samplers. To compare the 
variation in samples so obtained with the sub-sampling error corre¬ 
sponding to duplicate determination of dry matter in the laboratory, 
the samples from the remaining loads were analysed in duplicate for 
dry matter: a single dry matter determination only was made for 
each of the triplicate field samples. 



Fig. 0 * 7 . 


Table XII. 

Sampling Results on Silage. 

Percentage dry matter. 


Load. 

Separate Sampling. 

Load. 

Duplicate Determin¬ 
ations. 

! - 

15 . | 

c. 

(0* 

(ID- 

0 

19*80 

20*50 

20*80 

1 

20*55 

21*65 

7 

21*25 

, 22*10 

21-90 

2 

21*00 

21-20 

8 

1 22*45 1 

t 22*70 

23*15 

3 

20*40 

20-33 

9 

22*45 

22-o.j 

23*40 

4 

20*40 

20-00 

10 

! 22-85 

> 22-80 

22-80 

6 

19*93 

21*15 





11 

23*85 

22*95 

Mean 

| 21*76 

| 22-13 

22*41 

12 

22*75 

22*65 



Fig. 56. 





1937] of Research in Agriculture and Applied Biology . 157 

Table XIII. 

Sampling Results on Silage. 


Percentage dry matter. 


Load. 

Separate Sampling. 

Load. 

Duplicate Determina¬ 
tions. 

A. 

B. 

c. 

0). 

(ii). 

5 

23*00 


23*05 

1 

19*90 

18*95 

6 

22*85 * 

22*85 

23*10 

2 

18*05 

17-05 

7 

zmo < 

21-* 

21*75 

3 

19*75 

19*95 

8 

20S85 

-21'lfc 

20*80 

4 1 

21*05 

21*85 

9 

21-50 'j 

Isi-IL 

21*90 

10 

21*65 

21*95 


- h 

1 


11 

21*80 | 

22*75 

Mean 

21*96 

| 22-OP 

22*12 





The analysis of variance has four quite separate sections, dealing 
with the loads sampled in triplicate, and the remainder, for each of 
the silos. 

Table XIV. 

Sampling Results on Silage. 

Analysis of variance. 




D.F. 

Sum of 


S.E. 



Squares. 

Square. 


Duplicates: 

Among loads 

6 

18*1261 

3*0210 


A.I.V. 

Between duplicates 

7 

: 

1*8362 

0*2623 

0*5122 (2*40%) 

! 

' 

Sampling ; 

Among loads 

4 

13*7250 

3*4313 



Among samplers 

2 

1*0630 

0*5315 



Interaction 

8 

0*6570 

! 

0*0821 

0*2865 (1*30%) 


Duplicates: 

Among loads 

; 

5 

28-6467 

! 

5*7293 


Molasses 

Between duplicates 

6 

1-3675 

0*2279 

0*4773 (2*34%) 

Sampling : 

Among loads 

4 

8*7523 

2*1881 




Among samplers 

2 

0*0670 

0*0335 



Interaction 

8 

0*3297 

0*0412 

0*2030 (0*92%) 


No differences are apparent between samples for the molasses 
silage, but the variation between them is significant for A.I.Y. (The 
variance 0*5315 compared with the residual variance 0*0821 gives a 
value of z = 0*9340, which has a significant level of P = 0*020.) 
This indicates a bias among the samplers, a result quite possible 
owing to the inevitable variation within the same load, and the some¬ 
what different positions of the samplers, one of whom was inside 



158 Bartlett— Some Examples of Statistical Methods [No. 2, 


the silo. v Nevertheless, the magnitude of the bias is small, being for a 
single sample of the order of 

5 (0*5315 - 0*0821)} = 0*2998, 

or 1*36 per cent, of the general mean, and it was not considered par¬ 
ticularly important. 

A curious result of the analysis is the apparent larger variation 
between duplicates than the interaction term between samplers and 
loads. This could be tested by the z test; but the comparison of 
these residual variances may usefully be treated here as a particular 
case of testing the homogeneity of a set of variances, of which some 
may prove larger or smaller than the remainder. 

To test the homogeneity of a set of k estimated variances s r 2 , 
with n r degrees of freedom, we obtain a crude 99 value of yf by 
computing * 

n log, s 2 — Sh, log* s r 2 or 2*3026 (n log 10 s 2 — 5>z r log 10 s r 2 ), 

where s 2 is the pooled variance with n = Sw, degrees of freedom. 
If this value of yj (with k — 1 degrees of freedom) appears significant, 
it is advisable to calculate a “ corrected ” value by dividing by a 
correcting factor C, where 

c= 1 + 3 [k -1) r r ~ »)* 

the x 2 approximation being then much improved. 

If possible heterogeneity between groups of variances is of special 
interest, the pooled variances for the different groups should be con¬ 
sidered separately from the variation with groups, the “ crude 55 
value of x 2 being additive. 

From the four residual variances of Table XIII, we obtain a value 
0*2464 (13 d.f.) for the mean variance “ between duplicates/’ and 
0*0617 (16 d.f.) for the mean *' interaction term/’ The mean for all 
the degrees of freedom (29) is 0 * 1445 . 

Since 

2*3026 (29 log 10 0*1445 - 7 log 10 0*2623 - 6 log 10 0*2279 
- 8 log 10 0*0821 - 8 log 10 0*0412) = 7*66 
and 2-3026 (29 log 10 0*1445 - 13 log 10 0*2464 — 16 log 10 0*0617 
= 6*68 

we obtain an analysis of residual variances as in Table XV. 

The value 6*46 is 6*68/1*035, where 

:l*035 = i + *(*+*,-*)- 

The significance level corresponding to y 2 (1 d.f.) = 6*46 is P = 
0 * 011 . 

* For the theory of this test, and its relation to the Xeyman-Pearson likeli¬ 
hood criterion, see (15). 



1937] of Research in Agriculture and Applied Biology . 


159 


Table XY. 

Analysis of Residual Variances. 



D.r. 

Crude 

Corrected %*. 

\Vithin groups . 

2 

0*98 


Duplicates v . <c Interaction ” ... 

1 

6*68 

6*46 

Total . 

3 

7*66 



This result was hardly to be expected, since any sub-sampling 
error in the laboratory should be implicit in the variation for the 
separate field sampling errors. The standard error corresponding 
to the error between duplicates was 2*4 per cent., and compared 
favourably with the usual errors previously experienced, so that the 
only explanation of this apparent reduction in the error is that since 
it was known which samples corresponded to the sampling in tripli¬ 
cate, extra care may have been given to the determination of the dry 
matter percentages. The smallness of the interaction term indicates 
in any case that there is no evidence at all for random field sampling 
errors, as distinct from the question of bias. 

The Efficient Detection of Treatment Effects . 

Statisticians are well aware that an efficient test of significance of a 
possible discrepancy does not necessarily imply making an estimate 
of its magnitude. Nevertheless not many would fail to recognize 
that directly they have to deal with several possible differences, the 
more they can condense the description of the treatment effects, the 
more likely they will be to make efficient tests of them. This is well 
illustrated by the factorial experiment and its analysis. We notice 
that the analysis of variance isolates the main effects and low-order 
interactions, and if, as is so often the case, the treatment effects are 
chiefly contained in these terms, leads simultaneously to an adequate 
description of the treatment effects and to the most efficient method 
of testing them. 

This function of a statistical analysis is perhaps worth stressing, 
since while if any detailed theory of the mode of action of treatments 
is advanced, the statistician has merely to test its adequacy, for the 
efficiency of any more empirical statistical reduction of the data he 
may be largely responsible. The following five references to different 
types of data are all intended to illustrate this same point. 

(i) The value of splitting up a response curve into its linear regres¬ 
sion and other orthogonal components has probably been more widely 
appreciated since the introduction of the 3 X 3 X 3 27 -plot layout, 
but the advisability of splitting up interaction degrees of freedom 





160 


Babtlett —Some Examples of Statistical Methods [No. 2, 


when the main effects can be condensed into regression effects should 
never be lost sight of. 

A very obvious illustration of this is given by the results of a 
1933 sugar-beet experiment. This experiment consisted of four main 
spacing treatments in a 4 X 4 Latin square, each plot of which was 
divided into four sub-plots for different rates of manuring. The 
treatment totals for weight of tops are given in Table XVI. 

' Table XVI. 

Sugar-beet Experiment , Jealotfs Hill . 

Weight of tops after wilting. 

(Total weight of four plots in lb.) 



Spacing. 

Total. 


8, ' 

S, 

F* 

P* 


M 0 

780 

733 

654 

760 

2,927 

Mx 

845 

842 

788 

757 

3,232 

M, 

1,074 

1,028 

941 

861 

3.904 


1,279 

1,187 

1,081 

936 

4,483 

Total ... 

1 

3,978 

3,790 

3,464 

3,314 

14,546 


While the spacing treatments—S 0 : 12 -in. rows singled to 9 ins., 
S x : 12 -in. rows singled to 12 ins., S 2 : 18 -in. rows singled to 9 ins., 
S 3 : 24 -in. rows singled to 9 ins.—are not in equal “ steps/’ a regres¬ 
sion term on the order of closeness is found to isolate the main 
spacing effect; and the regression term for the manuring treatments 
(rates of complete fertilizer) the main manuring effect. It is there¬ 
fore natural to isolate the interaction term kC spacing regression X 
manuring regression,” which is highly significant, whereas a test of 
the entire set of interaction terms would have missed this effect. 

The interaction term is obtained from the table of treatment 
totals by use of the multiplying scheme :— 



S 0 * 

S|. 

s». 

S a . 

ar. 

-9 

-3 

3 

9 

M, 

-3 

-1 

1 

3 

M, ! 

3 

1 

-1 

-3 

M. I 

9 

3 

-3 

-9 


which is obtained from the symbolic multiplication of the main 
regression effects 

(- 3M 0 - M x + M 2 + 3M s )(3S 0 + B 1 - S 2 - 3 S 3 ). 

The analysis of variance is shown in Table XVII. 

This interaction effect between manuring and spacing is one that 




1937] of Research in Agriculture and Applied Biology. 161 


is quite often observed. In the present experiment tbe significant 
spacing main effect is due to tbe differences in yield with manuring 
at tbe different spacings, and not at all to differences in yield with no 
manuring, but since a significant interaction between spacing and 
manuring has been observed, tbe extent to which a main effect arises 
from different levels of the other factor is naturally examined. 

Table XVII. 

Sugar-beet Experiment , Jealotfs Hill. 


Analysis of variance. 



D.F. 

Sum of Squares. 

Mean Square. 

Spacing regression . 

1 

16,791 

16,791 

Spacing remainder . 

2 

3,301 

165*5 

Rows... 

3 

20,919 

6,972-9 

Columns 

3 

1,297 

432-4 

Error (< 7 ) . 

6 

4,682 

780-3 

Manuring regression ... 

1 

89,111 

89,111 

Manuring remainder. 

2 

1,835 

917-5 

Interaction (reg. X reg.) . 

1 

7,208 

7,208 

Interaction remainder . 

8 

2,321 

290-1 

Error ( b ) . 

36 

29,259 

812-8 

Total . 

63 

173,753 



(ii) In the last example the treatment effects were isolated in 
regressions on a simple order of spacing or level of nutrient, but any 
other variate on which a regression effect is observed may aid in 
detecting or confirming treatment differences. 

Thus the analysis of variance given in Table XVIII refers to the 
1935 results of a Brussels sprouts experiment to determine the relative 
values of organic and inorganic nitrogenous manures (A. H. Lewis 
(16)). In this analysis the effect of treatments (different forms of 
nitrogen) on the yield (of marketable sprouts, fresh weight in lb.) 

Table XVIII. 


Brussels Sprouts Experiment , Datchet . 
Analysis of variance. 



D.r. 

Sum of Squares. 

Mean Square. 

/ Regression on pH . 

\ Remainder. 

1 

593-46 

593*46 \ 

8 

320*80 

40*10/ 

Treatment Total . 

9 

914*26 

101-58 

Blocks . 

2 

351*68 

175-84 

Error . 

18 

914-80 

50*82 

Total . 

29 

2,180*74 


















162 


Bartlett— Some Examples of Statistical Methods [No. 2, 


might appear doubtful, but if the degree of freedom corresponding 
to the regression of yield on the pH treatment values (which were 
significantly different) is isolated, the effect of treatments on yield 
is at once clearly established. No causal relation between pH and 
yield is necessarily proved by this result, but a more careful study of 
the relationship with pH, as indicated, for example, by the individual 
plot values, is at once suggested. 

The isolation of single treatment degrees of freedom, as in the 
above two examples, must of course arise naturally, otherwise there 
can be no case for splitting up the set of treatment differences when 
testing them. 



(iii) In some experiments by Mr. Gr. E. Blackman on the combined 
effect of shading and forms of nitrogen on the composition of the 
herbage on a lawn, the variation in clover content was studied by 
estimating every week the percentage area covered by clover in the 
different plots. In order to compare as efficiently as possible the 
apparent differences due to treatment, the changes had to be sum¬ 
marized, and the predominant factor in such a summary was seen 
graphically to be the regression of percentage area covered with time. 
The variation in the first experiment for the four replications of one 
treatment (shade plus sulphate of a mm onia) is shown in Eig. 6. 

Since the percentage area figures are less able to vary when near 
o (or ioo) than when in the middle of the range, a transformation of 
the figures might be found to lead to improvement. The variate 
log (or ; (100 — or)) was consequently tried, and though it is not 
suggested that a linear regression in terms of this variate always 




1937] of Research in Agriculture and Applied Biology . 163 


completely summarized the changes, it usually had the effect of 
straightening out regression lines sloping down to o on the original 
scale.f Analyses of variance were made on the linear regressions of 
this variate with time. 

The transformation was an empirical one, though, as a rationaliza¬ 
tion of its use, we might notice that the change in percentage area 
will not only be dependent on the amount of the plant species there 
to expand or decrease, but also on the area available to expand in, 
or the competition of the species causing it to decrease. Hence we 
could suppose that for a constant cause of change, the percentage 
area x would depend on an equation 
d t 

~ — kx (100 — x) 

whence by integration 

This last equation can also he written 

__ 100 _ 

X ~ 1 + exp{— (Bi + C)} 

in which form it will be recognized as a well-known formula for the 
law of population growth. 

(iv) The observations in Table XIX refer to a germination experi¬ 
ment on peas carried out by Mr. W. G. Templeman. Efty seeds 
were sown for each replication on May 5th, 1936, in pans placed in a 
pot-culture shed. The full records are only reproduced here for the 
first treatment. 

Table XIX. 


Germination Experiment on Reas. 
No. of seeds germinated (Treatment 1). 


Blocl. Date.— 

llaj 10th. 1 

llth. 

llth* 

12th. 

13th. 

14th. 

13th. 

18th. 

A 

_ 

3 

9 

29 

36 

36 

38 

40 

B 

— 

9 

18 

30 

36 

39 

40 

41 

C 

— 

8 

10 

18 

32 

36 

38 

39 

D 

— 

— 

— 

4 

16 

30 

36 

42 

E 

1 

10 

15 

24 

35 

41 

41 

43 

F 

1 — ! 

3 

6 

16 | 

33 

36 

39 

39 

Total 

■ 

33 

58 

121 i 

188 

218 

232 

244 


* Counted later in same day. 


t A straightforward logarithmic transformation would have had a similar 
effect, but would of course fail at the other end of the range. The transforma¬ 
tion actually used was already familiar owing to its use in assessing the signi¬ 
ficance of fairly large changes in percentage area in other data. 





164 Bartlett —Some Examples of Statistical Methods [No. 2, 

The final numbers for all the treatments are given in Table XX. 
Table XX. 


Germination Expei iment on Peas. 
Germination counts, May 18th. 


Block. Treatment.— 

1 . 

2 . 

3. 

4. 

5. 

6. 

7. 

A 

40 

43 S 

42* 

39 j 

44 

42 

40 

B 

41 

36 

40 

41 

43 

44 

45 

C 

39 

41 

45 

45 

45 

42 

39 

D 

42 

35 

39 

43 

46 

44 , 

41 

E 

43 

38 

40 

45 

44 

44 ! 

41 

F 

39 

34 

44 

44 

42 

45 

39 

Total ...j 

244 

227 

250 

257 

264 

261 

845 


* Estimated. 


From Table XX the value of x 2 within treatments is 24*0 (34 d.f.), 
while between treatments y} = 22*8 (6 d.f.). This is a sufficient 
indication of treatment differences, though the lack of validity of a 
X 2 test of treatment effects if heterogeneity is already present need 
hardly be stressed (see (19) p. 76). 

The remaining problem of the effect of treatments on rate of 
germination is, in the absence of any precise theory and definition of 
rate of germination, not quite so readily solved, but as a provisional 
method at least of attacking the problem a “ rate index ” was 
constructed from the mean fraction of finally emerging seedlings 
taken over the several times. For Al, for example, this gives a 
value 

(0 + 3 + 9 + 29 + 36 + 36 -f 38 + 40)/(8 X 40) = 0*598. 

This index has no absolute meaning, but since each count includes 
the count previously made, the number coming up at any date is 
weighted in accordance with this date’s earliness or lateness, so 
that an efficient analysis of the effect of treatments on rate of 
germination might reasonably be expected from this index.f 

The analysis of variance for this experiment is given in Table 
XXI. 

No significant treatment differences were obtained, but a just 
significant block effect (z = 0*575, while z (P = 0*05) = 0*467) will 
be a reminder that positional effects may be important in greenhouse 
as well as in field experiments. 

(v) A general account of the probit-log. dose method of evaluating 
dosage-mortality relations ( e.g . the toxicity of insects to various 
strengths of an insecticide) proposed by Bliss and others was given 

t For counts at equal intervals, the index is equivalent to finding the 
tk mean date of emergence.” 




1937] of Research in Agriculture and Applied Biology . 165 


Table XXI. 

Germination Experiment on. Peas . 
Analysis of variance of tk rate index.” 



D.F. 

Sum of Squares, 

Mean Square. 

Treatments . 

6 

0*4526 

0*0754 

Blocks. 

5 

4*7447 

0*9489 

Error. 

29* 

8*7231 

0*3008 

Total . 

! 40 

13*9204 



* One missing value. 

by Irwin recently (17). In some brief notes (IS) on the use of the 
method, I stressed the difficulty sometimes caused by heterogeneity 
in the population, and cited some laboratory data of Dr. H. EL S. 
Bovingdon’s on the differential resistance to a fumigant of two 
different age-groups of the bed-bug ( Cimex lectularius) . My reason 
for mentioning these data again here is that they illustrate the value 
of an adequate reduction of the data in making an efficient comparison 
of the two age-groups. 

The mortality figures for the different dosages of the fumigant 
(at 24 hours 5 exposure) are given in Table XXII, separately for the 
two groups, e ‘ adults ” and nymphs.” 

Table XXII. 


Kills in Fumigation Experiment on the Bed-hug . 


II 

Adults. 

Nymphs. 

Total. 

(Mg./ 

litre). 

Adults. 

Nymphs. 

_ 

Total. 

Dose. 

Total. 

Dead. 

Total. 

Dead. 

Total. 

Dead. 

Dose. 

Total, 

Dead. 

Total. 

Dead. 

Total. 

Dead. 

7*83 

20 

2 

10 

0 

30 

2 

2S 0 

9 

9 

! 27 

24 

36 

33 

11*76 

25 

3 ; 

9 

5 

34 

8 

29 8 

2s ; 

21 

4 

3 

32 

24 

17*2 

21 

6 

5 

3 

29 

9 

32 0 

25 

19 

3 

3 

2b 

22 

39*0 

23 

6 

11 

10 

34 

16 

30*9 

15 

14 

17 

Kzl 

■I 

31 

20*9 

24 

19 

7 1 

6 

31 

25 

10 2 

17 

17 

15 

B9 

■ 1 


23*2 

22 

b 

10 ; 

5 | 

32 

IS 

11 9 

8 j 

8 

20 

m 

■ 9 


24*6 

13 

S 

18 

i 

17 

31 

23 








i 


Total 

253 

140 

150 

128 

409 

; 



It is known that the use of x 2 for comparing each dosage separately 
would not be valid; but even if it were, such a test would be very 
insensitive to the particular contrast adults v. nymphs in which we 
are interested, and the totals for all dosages could not be used owing 
to the variable numbers for the different dosages. The probit-log. 
dose method enables us, however, to express the dosage-mortality 
relation for each age-group separately, the regression equations being 
estimated to be 


Adults : y = 1*985 x — 1*563 
Nymphs : y = 5*082 x — 0*838, 




166 Bartlett —Some Examples of Statistical Methods [No. 2, 


where y stands for probits and x for log. dose. On this scale the 
(significant) difference between the two groups appears (the error is of 
course fairly large) as a simple parallel displacement of one line 
relative to the other, so that the dosage required to produce any 
particular mortality in the nymphs must be multiplied by a constant 
factor of about 

antilog (1-563 - 0-838)/5 = 1-4 
before the same effect would be produced in the adults. 

Further Notes on the Analysis of Heterogeneous Data . 

Any experimental counts on heterogeneous data clearly require 
replication before treatment differences can be validly established. 
With field experiments a valid use of x 2 for such tests of significance 
is rarely possible, and with more controlled indoor experiments often 
fails. In the evaluation of insecticides by the probit-log. dose method, 
replication exists in the several doses at which the insecticide 
is tested, though it is clearly necessary before rigorous comparison 
of such results is possible that a batch of insects used at any one time 
can be regarded as a sufficiently random sample from the whole popu¬ 
lation of insects that may be used in evaluating similar insecticides. 

In other insecticidal experiments, more empirical comparison 
of treatments is often sufficient. For field experiments, in which 
any number of insects may naturally occur on any plot, the square 
root transformation has often been found useful in analysing the 
results. A discussion of this transformation has been given elsewhere 
(19). For more controlled variation where the original number n 
used in any replication is known, the corresponding theoretical 
transformation is y = sin“VJ/ h), where t is the number counted 
[e.g. as dead). This transformation was also discussed in the paper 
referred to, but had not at the time been considered particularly 
necessary. 

For routine insecticide tests in which a number of insects are 
allowed access to poisoned bait, or are subjected to toxic sprays, so 
that no exact dose for any one insect is known and replication is 
essential, it is found that heterogeneity, either in the insect population 
or in the variation of toxic dose received, renders some routine form 
of analysis of variance of the data desirable. Recently the sin - 1 ^/x 
transformation has frequently been used for this type of experiment, 
in order that a fairly standard analysis of such experiments could if 
possible be decided on. Once a direct table of sin -^x is available,* 
the use of this variate becomes no more laborious than that of <\/x. 

In Table XXIII some results by Mr. F. J. D. Thomas on the 

* I have given a short table in (20), (for a somewhat different purpose). 
Dr. C. I. Bliss has independently published a fuller table elsewhere (22). 



1937] of Research in Agriculture and Applied Biology . 167 

toxicity of various spray treatments to blowflies (Calliphora erythroce- 
phala) illustrate this kind of data. At each replication 25 insects are 
used. Before treatment they are “ doped ” with acetone. 

Table XXIII. 


Insecticide Test on Blowflies . 
No. of dead flies. 


Treatments:— 

A. 

B. 

n 

D - 

E. 

r. 

G* 

Replications : 

1 

24 

(25) 

17 

17 

18 

23 

1 

2 

25 

(25) 

15 

17 

25 

25 

1 

3 

24 

(25) 

12 

17 

24 

23 1 

1 

4 

21 

(25) 

(25) 

20 

22 

16 

23 

10 

5 

23 

21 

13 

22 

23 

4,6 


* Control (vith one extra replication). 


While any treatment giving over about 95% mortality is omitted 
from the statistical analysis, it may still be usefully included in the 
summary. 

Table XXIV. 


Insecticide Test on Blowflies . 
Summary of results. 



A. 

B. 

0. 

D. 

E. 

E. 

G. 

S.E. 

°o kill . 

95 

(100) 

68 

69 

84 

94 

4 


y — ... 

1*41 

(1-57) 

0*98 

0*99 

1*22 

1*34 

0*37 

0*082 


An examination of several experiments has not suggested any 
difference in variance for the sin _1 y'rr variate over different parts of 
the possible range, and the value obtained seems to be fairly constant 
from one experiment to another of the same series. For the experi¬ 
ments originally considered, Table XXV gives the estimated values. 

Table XXV. 

Insecticide Tests on Blowflies . 


Observed variance of sin tjn . 


Experiment. 

n.r. 

Variance. 

Espenment. 

D.E. 

Variance. 

1 ! 

66 

0*02925 

6 

19 

0*04614 

2 

52 

0*03817 

7 

25 

0*03398 

3 

43 

0*02901 

8 

24 

0-03505 

4 

52 

0*03827 

9 

66 

0-01550 

5 

20 

0*02587 







Total 

367 

0*030682 











168 Bahtlett— Some Examples of Statistical Methods [No. 2, 


On testing the differences between these variances, a significant 
value of x 2 (see p. 158 ) was obtained, this being due to the last variance 
being an exception to the general agreement, as is seen in Table 
XXYI.* 

Table XXVI. 


X 2 Test of Variances. 



D.r. 

Crude % l . 

Conected **. 

Experiments 1-8 . 

7 

3*63 


9 v. remainder. 

1 

14*06 


Experiments 1-9 . 

8 

17*71 

17-51 


On enquiry, it was found that the last experiment might have 
been distinguished from the other eight because of the technique 
adopted, so that it is of interest to see whether for other experiments 
of the same type the lower variability is maintained. An examina¬ 
tion of three more of the same kind has given values 0*01143 (36 d.f.), 
0*01360 (16 d.f.) and 0*03037 (27 d.f.). While these tend to agree in 
suggesting a lower average, they appear somewhat heterogeneous, 
X 2 for the four experiments giving the significant value of 8*57 (3 d.f.). 

The absolute value of the variance is always worth watching 
(for experiments 1-8, the mean value is 0*0340, for the other four, 
0*0170). Nor homogeneous variation it should be of the order of 
J X -^ = 0*01 ,f so that the maximum theoretical reduction in the 
variance for experiments 1-8 is somewhat over 3, while for the re¬ 
maining experiments (if the value 0*0x70 is representative) it seems 
nearer 1 Unless a purer and more sensitive stock of insects were 
bred, this indicates correspondingly the maximum increase in in¬ 
formation that a better control of causes of heterogeneity would make 
possible. The numerical results reached here must be regarded as 
provisional, but they suggest that part of this heterogeneity may be 
dependent on the particular technique employed. 

While heterogeneity exists, and strict randomization of treatments 
is not practicable, the validity of an analysis, even for such replicated 
experiments, will, of course, depend on similar assumptions to those 
necessary in the more precise evaluation of insecticides. Replication 

* The test will, of course, be slightly disturbed by the nature of the sin _1 \ 'x 
variate for these data. 

f The effect of discontinuity on the variance at the ends of the range (cf. 
(19) p. 74) has been rather ignored in the present discussion. When the ob¬ 
served variance is of the order 0*03, it seems perhaps rather a superfluous 
refinement to try to correct for it. It might, however, be noted here that a 
good correction {rather than the use of ~ \ considered in (19)} is simply to 
write l for 0 (and n — \ for n). For example, 25 insects dead out of 25 would 
then count for the purpose of the sin -1 variate as 99 per cent., not UK) per cent, 
dead. 



1937] of Research in Agriculture and Applied Biology . 169 


must not tacitly be assumed to imply a valid experiment. The risk 
ot any appreciable systematic error might be considered negligible, 
but if so, the value of a check investigation to test this assumption, 
analogously to the test for bias in sampling investigations, might with 
any routine type of experiment be well worth while. (For the above 
type of experiment, the experimenter has not so far detected any 
appreciable errors of this kind.) 

Concluding Remarks. 

While in this paper I have made no attempt to confine the discus¬ 
sion to one topic, there are, of course, many applications of statistical 
methods in agriculture and allied fields on which I have not touched. 
I have naturally referred in particular to those topics about which I 
felt perhaps I had something useful to say. 

One experiment on Brussels sprouts was briefly referred to. This 
indicated the use of familiar field layouts, but for market garden crops. 
Such crops sometimes appear somewhat variable, not perhaps so 
much in their final yields as in the yields of early pickings, a point to 
be considered in view of the economic importance of these early 
yields. 

A major problem not discussed is experimentation on permanent 
grassland. Besides the necessity of studying ecological changes of 
composition in a pasture as well as its immediate yield, a practicable 
method of keeping a pasture grazed by livestock, and at the same time 
evaluating statistically its production under two or more different 
treatments,” has not so far as I know been reached (cf. (21) p. 69). 

It is on this note of incompleteness that this paper is concluded. 

Achi owledgments. 

Finally, it remains for me to express my grateful thanks to 
members of the staff of Jealott’s Hill Research Station (both Ferti¬ 
lizer and Pest Control Divisions), to whom I am indebted for experi¬ 
mental data referred to in the paper. I hope they will accept these 
general thanks as a sufficient acknowledgment. 

With Jealott’s Hill I associate Dr. F. Crowther, to whom, as 
representative of the Joint Research Scheme, it is a pleasure to ex¬ 
press my indebtedness for the Egyptian data : also Mr. 6. E. Black¬ 
man, who is now at the Imperial College of Science and Technology. 

References. 

(1) R. A. Fisher. “ Statistical Methods for Research Workers ” (5th ed., 

1934). 

(2) R. A. Fisher. The Design of Experiments ” (1935). 

(3) F. Yates. wt Complex Experiments.” J. Roy . Stat. Soc . ISuppl.), 1935, 

2, 181-223. 

SURP. VOL. IV. NO. 2. 


H 



170 


Discussion 


[No. 2, 


(4) F. Crowther and Ahmed Mahmoud. “ A Preliminary Investigation of the 

Interrelation of Variety, Spacing, Nitrogen and Water Supply, with 
Reference to Yields of Cotton.' 5 Royal Agricultural Society, Egypt ; 
Bulletin Xo. 32 of Technical Set lion, 1935. 

(5) F. Crowther. “ The Effects of Variety, Spacing, Nitrogen and Water 

Supplv on the Development of the Cotton Plant and the Rate of its 
Absorption of Nitrogenous Fertilizer.” Bulletin Xo. 25, 1936. 

(6) F. Crowther, A. Tomforde and Ahmed Mahmoud. “ Nitrogenous and 

Phosphatic Manuring of Cotton and their Relation to Variety and 
Spacing.' 5 Bulletin Xo. 26, 1936. 

(7) J. Wishart. “ Statistics in Agricultural Research. 55 J. Roy. Stat. Soc. 

(Suppl), 1934, 1, 26-51. 

(8) F. Yates. “The Analysis of Replicated Experiments when the Field 

Results are Incomplete.” Emp. J. Exp. Agric., 1933, 1, 129-42. 

(9) F. Crowther. 1,4 Interrelation of Nitrogenous Manuring, Variety and 

Spacing for the Wheat Crop.” Bulletin Xo. 24 (B), 1936. 

(10) S. J. Watson and W. S. Ferguson. “ The Nutritive Value of Artificially 

Dried Grass and its Effect on the Quality of Milk Produced by Cows of 
the Main Dairy Breeds.' 5 J. Agric. Sci., 1936, 26, 189-211. 

(11) M. S. Bartlett. * 4t An Examination of the Value of Covariance in Dairy- 

cow Nutrition Experiments.” J. Agric. Sci., 1935, 25, 238-44. 

(12) S. J. Watson and W. S. Ferguson. 44 The Value of Artificially Dried 

Grass, Silage made with Added Molasses and A.I.Y. Fodder in the Diet 
of the Dairy Cow* and their Effect on the Quality of the Milk, with 
Special Reference to the Value of the Non-protein Nitrogen.” J. 
Agric. Sci., 1936, 26, 337-67. 

(13) S. J. Watbon. J. C. Drummond, I. M. Heilbron and R. A. Morton. “ The 

Influence of Artificially Dried Grass in the Winter Ration of the Dairy 
Cow on the Colour and Vitamin-A and -D Contents of Butter.” Emp. 
J. Exp. Agric., 1933, 1, 68-81. 

(14) S. J. Watson and W. S. Ferguson. “ The Losses of Dry Matter and 

Digestible Nutrients in Low-temperature Silage, with or without Added 
Molasses or Mineral Acids.' 5 J. Agric. Sci., 1937, 27, 67-107. 

(15) M. S. Bartlett. 44 Properties of Sufficiency and Statistical Tests.” Proc. 

Roy. Soc. 1937, A 160, 268-82. 

(16) A. H. Lewis. 4 ‘ The Relative Values of Inorganic and Organic Nitrogenous 

Manures. I. Field Experiments at Datchet, 1934 and 1935.” I.C.I. 
Ltd., A.R.A. Report , No. 434. 

(17) J. 0. Irwin. 44 Statistical Method Applied to Biological Assay.' 5 J. Roy. 

Stat. Soc. {Suppl.), 1937, 4, 1-60. 

(18) M. S. Bartlett. “ Some Notes on Insecticide Tests in the Laboratory and 

in the Field.' 5 J. Roy. Stat. Soc. {Suppl.), 1936, 3, 185-94. 

(19) M. S. Bartlett. 44 The Square-root Transformation in Analysis of 

Variance. 55 J. Roy. Stat. Soc. {Suppl.), 1936, 3, 68-78. 

(20) M. S. Bartlett. 44 Sub-sampling for Attributes.' 5 J. Roy. Stat. Soc. 

{Suppl.), 1937, 4, 131-5. 

(21) J. Wishart and H. G. Sanders. t4 Principles and Practice of Field Experi¬ 

mentation. 55 Empire Cotton ({rowing Corporation, 1935. 

(22) C. I. Bliss. “ The Analysis of Field Experimental Data expressed in 

Percentages. 55 [In Rician]. Plant. Prot. Leningrad, 1937, fasc. 12, 
67-77. 


Discussion on Mr. Bartlett’s Paper. 

Mr. Gosset : On reading Mr. Bartlett’s paper, I saw that I 
could add little or nothing to his treatment of the statistical principles 
involved, but it occurred to me that other people besides myself 
might have had their curiosity aroused by certain matters of less 
interest perhaps statistically but yet of some practical importance. 
I refer of course to the results of the experiments. I therefore wrote 



1937]^ on Mr. Bartlett's Paper. 171 

to Mr. Bartlett, who very kindly sent me his copies of the four 
papers in the list of references, with which Dr. Crowther’s name is 
^associated, and I am going to give an account, necessarily inadequate, 
of the fine piece of work which they describe. 

The four papers deal primarily with the cotton crop in Egypt, 
particularly in the Delta. Cotton is grown in Egypt as an annual 
and not, as might be expected, as a perennial, because of the pests 
by ^hich it is afflicted, especially the Pink Boll Worm. This has so 
much increased of late years that the methods of cultivation have 
during the last ten years been modified throughout the country. 
At the same time, new varieties have been introduced, the tendency 
being to produce larger yields of cotton of shorter staple. That 
being so, it became necessary to examine how far these changes have 
altered/ the old standards of manuring and, particularly, what 
| profit was to be derived from nitrogenous manures. 

| The experiments directed by Dr. Crowther were concerned mainly 
with the elucidation of this question and, as you have heard from 
Mr. Bartlett, were carried out at several stations, where the effects 
of various levels of nitrogenous manuring were compared under 
different conditions of spacing, watering and phosphate manuring, 
and with different varieties of cotton. 

The actual gain from the use of nitrogen varied with the spacing 
adopted, with the different varieties and, naturally enough, between 
the different stations, but the average profit from the use of nitro¬ 
genous manures was over £3 per acre, and at only one out of eight 
stations was the profit not appreciable. Had the optimum quantity 
of nitrogen been used, the gain would have been considerably more. 
Furthermore, an experiment with wheat following cotton at a single 
station showed that in that case the increased yield of wheat more 
than paid for the nitrogen applied to the cotton. I think that is a 
very good instance of what large gain can be made : £3 an acre on all 
the cotton of Egypt would produce an enormous amount of money. 

These results may not seem to be very surprising until you learn 
(a) that previously it was generally believed that nitrogen was of 
little or no value to the cotton crop, and (b) that in Egypt nitrogenous 
residues were supposed to be leached out by the irrigation water. 

An investigation into the relation between the supply of nitrogen 
and the development of cotton leads Dr. Crowther to the opinion 
that it is largely owing to the closer spacing of modern practice 
that the plant can make good use of added nitrogen, but I should like 
to ask whether the substitution of the modern nitro-chalk for nitrate 
of soda or ammonium sulphate may not also have had a beneficial 
effect of its own. 

I have now much pleasure in moving a very hearty vote of 
thanks to Mr. Bartlett for his paper, and if I have rather strayed from 
the straight and narrow path which he has himself followed, I have 
done so in the confident expectation that my lapse will be atoned 
for by the speakers who will follow. 

Dr. Irwin : I should like to add my tribute to that of Mr. Gosset 
for this exceptionally interesting paper. It may perhaps be likened 



BibCUb&ion 


172 


[No. 2, 


to a guide book which describes ascents to interesting summits, 
giving views over a wide and varied country. 

I was filled with admiration for the experiments with which the 
paper opened and the way m which they had been handled. 

I have no criticism at all to make of these experiments, but they 
have provoked me to one or two reflections about various meanings 
which have been attached to the term '* interaction.” First, how¬ 
ever, I should like to refer to later sections. 

With regard to the dairy-cow experiments, the way in which the 
yield of the missing cow may be estimated by the introduction of a 
pseudo-variate is extremely neat. For one missing cow it is quite 
easy to see that the result given is the same as would be obtained 
by minimizing the error term. I should like to ask Mr. Bartlett if 
it works in the same sort of way when there are two or more missing 
cows l Can a pseudo-variate be introduced for each, or is there 
more in it than that ? 

The third thing that occurred to me as worth mentioning was the 
new test for testing whether several variances differed significantly 
or not, which is modestly tucked away on p. 158, and which Mr. 
Bartlett never even read, except for pointing out the misprints. 
We shall await with impatience the appearance of the Boyal Society 
paper in which the theory of that test is described. 

Lastly, I would refer to the section on the efficient detection of 
treatment effects. In the second example, where a regression term 
is isolated, the regression being not, as is usual, on the order of 
treatment, but on an extraneous but relevant factor, this struck 
me as an important device which might be of use on other 
occasions. 

Referring again to the question of the different meanings of the 
term k< interaction ”; there is no doubt, I think, that they are 
equivalent, but the formal treatment of their equivalence has been 
rather neglected. A few years ago, when I was writing about very 
elementary points in the algebra of analysis of variance, I gave a 
definition of an “ interaction ” sum of squares, or at any rate it was 
implied in the work. The ordinary first-order interaction sum of 
squares is, on this view, the sum of terms of the type 

S(x uv — x u . — x. „ + xf .(1) 

This definition has certain advantages, but the individual terms which 
make up the sum of squares are not independent. 

A second definition is as follows : Every item in the record belongs 
to two classes, which we may call treatments, with several groups in 
each class. Suppose, for instance, in the first class there are three 
different levels of treatment, (.r^, (x 2 ), (r 3 ) being the totals at these 
levels. Then the sum of the squares of the deviations from their 
mean can be expressed as the sum of the squares of two independent 
linear functions 


h( £ i) + h( x 2 ) + k( x s) 



1937] 


on Mr . Bartlett’s Paper. 


173 


The same can be done with the second class of the classification, 
the independent linear functions being, say, 

*i(*/i) + *2(2/2) + *3(2/3) 

4i(2/i) + 42 ( 2 / 2 ) + 43 ( 2 / 3 ) 

If all the nine possible combinations of the two different sorts of 
treatments are formed, there will be nine items in the record, which 
may be denoted by (x^), (x 1 g 2 )i (x-^), etc. If one takes the sets 
of coefficients 

/j\ (hi hi hi) (*u *2> *3) 

v ' (m v »? 2 , m 3 ) v ' (p 1? p 2 , p 3 ) 

combines one set from the first pair with one set from the second 
pair, e.g. (l v l 2 , h) an ^ (*i> * 2 » * 3 ) forming all the products such as 
? 1 * 1 , ?i*25 ?1*3 j etc., and associating these with (x^), (r^h (^iVs), 
etc., four linear functions of the nine items will be obtained, and 
these will constitute the four degrees of freedom. It is not very 
difficult to show that the sum of these four squares gives the same 
answer as if the first definition were used. 

There is still a third sense in which the term “ interaction ” has 
been used—in fact as it is used by Mr. Bartlett on p. 147 in that 
extremely interesting residual experiment. If we take the residual 
wheat treatments and form the sums of all possible combinations 
of the three levels of nitrogen with the three levels of spacing thus :— 


A) - (X 0 S 0 ) + AS,) + AS*) 
A) = (A T <A> -1- (A T A) + (iVA) 
A) - WA) + P T A) + ASA 


A) - (X Q S 0 ) + (lYA) 4 - m 
(b) = (W + M + (%) 

A) = (JTAI 4- (*YA) + (XA) . (2) 


a comparison can be made between these six quantities, only four 
of which are independent. On taking the sum of the squares of the 
deviations of the first three from their means and of the second three 
and adding them together, the result is the same interaction sum of 
squares as would be got from the first definition. But when it comes 
to higher-order interactions, the algebraical equivalence becomes 
rather hard to demonstrate. Following Mr. Bartlett, we may denote 
the quantities corresponding to ( 2 ) for the maize treatments by 
^1 * -^2’ ^2> ^2 

and then form the further quantities 


AA) + AY 1 ) + (ca> 
AY 1) + (BA) 4 - (W 
WA) - 1 - (BA) + AY,) 


A^i) 4 - (J5A) + AY,) 
Mil 1 ) ~h (S,X,) 4- (^Ai) 
MA) 4- AY,) 4 - (C,X,) 


• . (3) 


with three other similar sets of six quantities obtained by combining 
h^ii ^ 1)9 (A» ^ 2 i ,* (A 2 , (- 2)9 (A> 3 2 ' %i) and {A 2 > B 2 , C 2 ), 

(X 2 , 1 2 > Z*)- K for eac fi se ^ of three we take the sum of squares of 
deviations from their means and add together the results, we obtain 
the u interaction ” sum of squares corresponding to the 16 degrees 
of the freedom for the third-order interaction between wheat nitrogen, 
wheat spacing, maize nitrogen and maize spacing. The 2 degrees 
of freedom represented by the left-hand side of ( 3 ) are those which 
Mr. Bartlett confounds with blocks. This must, of course, give the 
same answer as if the te interaction ” sum of squares were defined, 



174 


Discussion 


[No. 2, 


as in the first definition, by non-orthogonal sums of squares. I do 
not believe this has ever been shown formally, and it might be worth 
the while of some mathematician to do so, because it is rather con¬ 
fusing to the student to be given several definitions or usages of the 
same thing without their equivalence being shown. This is not a 
criticism of the paper, as it would not have been appropriate to 
include a piece of formal algebra of this kind. 

I have much pleasure in seconding the vote of thanks to Mr, 
Bartlett for his interesting paper. 

Dr. Wishart said it gave him particular pleasure to speak 
on this occasion because it was one of the best ways of commending 
the progress of an old pupil to welcome his public appearance while 
giving a useful contribution to statistical science. 

A good many years ago Imperial Chemical Industries had laid 
down a large number of experiments with the modern randomized 
layout: the good work they had done was well known. The experi¬ 
ments Mr. Bartlett had selected for examination showed that 
Imperial Chemical Industries was still to the fore in this connection, 
and he, personally, regretted to hear that possible changes in policy 
had led to a situation in which this excellent pioneer work was 
unlikely to be continued on anything like its past scale. 

Dr. Irwin had put the discussion on a plane that made it particu¬ 
larly fitting for this Section. He had discussed rather technically 
some of the statistical aspects; Dr. Wishart did not propose to 
follow him there, but would merely say that each experiment 
described had been carefully chosen to illustrate some point, and if 
there were a fault, it was that the author had given his audience 
mental indigestion by the offer of so much food at once. 

Those who had wondered if, with the adoption of Professor 
Fisher’s methods, the last words had not been spoken on the question 
of layout, would be agreeably surprised at the insight shown by the 
author and his own contributions to the subject. The author was 
evidently a convert to the complex factorial designs, including 
confounding. At Cambridge they were more conservative and, 
generally speaking, the experiments there were kept simple. At 
outside centres simple experiments were undertaken, but in Cam¬ 
bridge itself somewhat more complicated designs were often used. 
The Bplit-plot design was popular; it was useful and easy to work. 
Usually from eight to twelve treatments were needed, so that a 
simple experiment would mean that the blocks were rather large. 
One series was generally only suitable for large plots, and so it was 
desirable to divide the block into a number of plots for the main 
treatments, and sub-divide the plots for other comparisons. Often, 
too, the real interest lay in the sub-plot treatments, and their inter¬ 
action with the main-plot treatments, comparisons which could be 
made with great precision on this form of layout. 

Another interesting section of Mr. Bartlett’s paper was that 
dealing with Dairy-Cow Nutrition Experiments. There again he 
could refer to experiments carried out at Cambridge. There was one in 
which different levels of oil were fed to the same cow for periods 



1937] 


on Mu Bartlett's Paper . 


175 


of one, two or four weeks, and in order to get over the difficulty of 
the variation of the lactation curve, the factors cows, weeks and 
treatments were arranged as in a Latin square (4 X 4 or 5 X 5). 
The errors were much lower than usual with milk-yield trials. It was 
hoped in the future to improve such a trial by having two or three 
cows in a group instead of one. They also had suffered from missing 
cows owing to their going dry, and had had to allow for this. The 
method was not of universal application because quick changeovers 
could not always be arranged. Dr. Sanders had expressed the opinion 
that the randomized block layout might perhaps be most suitable 
in general experiments of this kind. 

Another interesting experiment related to pigs. A contrast 
was made between a group-feeding trial with three pens containing 
ten pigs each, different feeding being given in each pen, and an 
individual trial with five pens, each containing six pigs, three hogs 
and three gilts from the same sow. The six pigs made their way to 
individual boxes at feeding time. Live weight gains were calculated 
over a sixteen-week period. There was a great gain in accuracy with 
the individual trial; the standard error for each pig’s live-weight 
gain was 6^ per cent as compared with 124 per cent. The small 
increase in live-weight gain with extra protein was quite insignificant 
in the group trial, but was detected in the other by means of a 
co-variance analysis on initial weight. In the group trial it was 
found that the regression of live-weight gain on initial weight was 
not significant. It has often been supposed that in animal trials 
the precision can be increased by correcting for initial weight. 
Here there was a considerable gain by correction in the case of 
individual animals, but none whatever in the case of the groups. 
A further advantage of the individual trial was that the meal con¬ 
sumption of each pig was known, enabling calculations to be made on 
the efficiency of meal conversion, an important practical point. 

Mr. Bartlett might with advantage have gone more closely into 
the details of the analysis in certain cases for the benefit of those not 
so familiar with the methods used. On p. 160 there was a case of 
an interaction of two regressions when the regression was taken out 
as part of a treatment effect. The statement is made that the inter¬ 
action term is obtained from the table of treatment totals by use 
of the multiplying scheme given. Dr. Wishart said that not only did 
he suspect that there was a wrong sign in one of the brackets, but 
he was doubtful whether this was a complete description of how the 
figures were obtained, while it did not tell the reader how to deal 
with a different number of treatments. This was only one instance 
of a large number of semi-arithmetical, semi-algebraical points 
which many ordinary workers had difficulty in following, and it 
suggested the need for some popular exposition which would tell 
them what they wanted to know. 

Mr. Blackman said that in his paper Mr. Bartlett had instanced 
some data on the changes in the clover content of experimental 
plots. One of the main difficulties in estimating changes in the 
botanical composition of plant communities was the error due 



176 


Discussion 


[No. 2, 


to the personal factor. In the majority of agricultural and ecological 
researches, the criteria used (numbering tillers, percentage area 
covered) had all been subject to the personal factor. The estimate 
of “ percentage absence,” a determination relatively free from per¬ 
sonal bias, had not up to the present time been considered favourably, 
largely because it had not been realized that the logarithm of 
percentage absence was proportional to the density, provided that 
the distribution was at random. During the last two years several 
papers had appeared in which it had been demonstrated that many 
species in different types of associations were distributed at random. 
There was also evidence that even in cases where the distribution 
was not at random, the linear relationship between the logarithm 
of percentage absence and density still held. The realization that 
statistical methods could be used in ecological researches was now 
gaining ground, particularly in the correlation of factors with the 
presence of the species, for example the influence of soil acidity on 
the incidence of bracken. Mr. Bartlett had indicated additional 
ways in which the statistician could materially help the biologist in 
the solution of his problems. 

Mr. Cochran was interested in Mr. Gosset’s remark that before 
Dr. F. Crowther’s experiments in Egypt the need for nitrogenous 
manuring had not been appreciated. He thought it was sometimes 
the reverse with the British farmer. Experiments had recently 
been carried out on commercial farms with potatoes, applying 
o, 4 , 8 , 12 , and 16 cwt. of mixed fertilizer per acre to the crop, and in 
writing his report he was asked to stress the level at which the crop 
failed to respond to any further dressing, which was about 12 cwt. 
He was told that, if left to themselves, the farmers would have put 
on a ton to the acre straight away. 

Mr. Cochran had come across the problem which Dr. Irwin had 
mentioned, in demonstrating to students the principles on which 
some of the confounded designs were based. In attempting to show 
that Dr. Irwin’s expressions were second-order interactions, he had 
taken the line that they added up to the right number of degrees of 
freedom, that they were all independent, and that they were clearly 
neither main nor first-order effects. 

He would like to add one or two observations on his own ex¬ 
perience of some of the problems about which Mr. Bartlett had 
spoken. In split-plot experiments the question always arose as to 
the relative efficiency of the main-plot and sub-plot treatments, 
and at Eothamsted they had of late been using split plots mainly 
where no great precision was required on the main treatments. 
Mr. Yates had recently been considering the possibility of increasing 
still further the accuracy of sub-plots by arranging them in a Latin 
square formation. This was possible, for instance, where there were 
four main treatments such as varieties, and four sub-treatments such 
as nitrogen, potash, neither, and both. This procedure would widen 
still further the gap between main and sub-plot accuracy, but as 
long as one was interested mainly in sub-plots, that did not matter. 

The field of animal nutrition had until very recently been 



1937] 


on Mr. Bartlett’s Paper. 


177 


particularly rich in experiments which gave no sign of any treatment 
effects whatsoever. There were many reasons for this, but one of 
the most potent was lack of experience in this field of work. It was 
not yet known how great a difference was to be expected from any 
treatment contrast, or with what accuracy one could expect to 
detect a difference with experiments of a given size. In this situation, 
Mr. Bartlett’s information about the precision with which experi¬ 
ments could be carried out, and his tips about how to carry them out, 
were particularly welcome. 

Mr. Cochran had been much interested in the author’s note on 
the germination experiment, in which he discussed the question 
of testing differences in the rate of germination. This problem was 
turning up with increasing frequency, and had been found in one 
connection mentioned also in the conclusions—in experiments with 
fruit and marketing garden crops; in these crops it meant a good 
deal in cash to the farmer if he could get his produce on the market 
earlier than the other man. In experiments on these crops it was 
often the custom to go through the field a number of times, picking 
on each occasion that part of the crop that was ready for the market. 
That gave a set of data of the same type as Table XIX, and the question 
at issue was the same. In most cases he had dealt with it by trying 
to determine what would correspond to the mean data of germination. 
This was the same as Mr. Bartlett’s index if the observation dates 
were at equal intervals, and was not likely to differ widely from 
Mr. Bartlett’s index in any case. Of course, the mean date might 
not be the best index to use. In market crops one could do better 
if one knew the prices which each picking realized on the market. 

In conclusion, Mr. Cochran said he had come to expect from Mr. 
Bartlett sound and mature reflection on any problem which he 
touched, and the paper that evening was full of examples of this. 

Mr. Eairfield Smith expressed his appreciation at having been 
invited, while a visitor in London, to attend these meetings, which 
provided opportunities for hearing and meeting men who were 
well known to all research workers interested in the technique 
and logic of experimentation. He wished to associate himself 
with previous speakers in their appreciation of the paper. It should 
prove a valuable source of reference, since it provided so excellent 
a series of examples in the methodology of experimental statistics. 
He had recently had an interesting case of the efficient detection of 
treatment effects which resembled Mr. Bartlett’s first and second 
examples, but was not quite covered by either. The figures were 
taken from an experiment with nine varieties of wheat at various 
spacings. The first reduction of treatment effects to regressions 
gave, for the relevant parts of the analysis of variance, the figures in 
the first three lines of the following table. These failed to show 
any significant effects due to spacing, and, varieties being qualitative 
things, it was not at first easy to see how the data could be further 
reduced to a regression which could be used like that for the manuring 
treatments of example (i). But it was noticed that while the mean 
regression of yield on plant density was close to zero, the early varieties 

h2 



178 


Discussion 


[No. 2, 


all gave negative regression coefficients and the late varieties gave 
positive ones. By taking out the regression of varietal regression 
coefficients on dates of flowering, the interaction sum of squares 
could be subdivided as shown in the last two rows of the table. 
It was now shown that the response to plant density in the region 
explored by this experiment depended on the type of variety (P = 


about 0*025). 

D.F. M.Sq. 

Average regression of yield on plant density . 1 101 

Interaction of regressions with varieties ... ... . 8 539 

Error ... ... ... ... ... ... ... ••• 55 375 

Regression of regression coefficients on flowering dates ... 1 2134 

Remainder ... ... ... ... ... ... . 7 311 


They sometimes had observations on several characteristics of 
each experimental plant or animal (e.g. yield of grain, height of plant 
and width of leaves; or measurements of a number of different 
bones of the skull), and although treatment effects on, or group 
differences of, individual characters might not be demonstrable with 
certainty, the cumulative effect on all characters were readily 
appreciable. To analyse this type of data some method was needed 
of scoring each individual to obtain a single value incorporating all 
the characteristics. In the past the usual method of doing this had 
been simply to form a compound score in which each characteristic 
had a weight arbitrarily assigned. Since efficiency in detecting differ¬ 
ences was being discussed it seemed appropriate to mention that 
for problems of this sort Professor Fisher had recently described 
(Ann. Bug. 7 : 179) a method for determining those weights which 
might be most efficient for any given purpose. 

To illustrate the value of this method, Mr. Fairfield Smith was 
indebted to Mr. Travers for the following data. Six psychological 
tests were given to a group of salesmen and to a group of managers 
to ascertain how well such tests could discriminate managerial 
capacity. When the individual test scores were simply added to 
form unweighted composite scores, their distribution was as shown 
in the upper part of the accompanying figure, in which each vertical 
line marked the score of one person. Discrimination between the 
two groups was poor. But when the individual scores were weighted 
according to a “ discriminant function ” evaluated to maximize 
the ratio of the mean difference to its standard error, the resulting 
weighted composite scores were distributed as in the lower part of 
the figure. And the value of the weighted scores was even greater 
than by the diagram indicated, because there was reason to believe 
that division into two classes above and below a weighted score of 
148 would be more correct than the official classification. 

At the end of section 2 Mr. Bartlett pointed out that if one block 
of an experiment has a greater error variance than other blocks, its 
inclusion might actually diminish the accuracy of the experiment. 
It was not uncommon for the error of one block to be increased by 
some occurrence such as waterlogging, which could not have been 
foreseen when the experiment was laid down. The error component 
of variance in such a block might be increased owing to both greater 






CRUDE SCORES 


1937] 


on M) Ba) tleft's Paper. 


179 




180 


Discussion 


[No. 2, 


heterogeneity within the block and to interaction of treatment 
effects with the superimposed condition (similar to the position effect 
noted in the penultimate paragraph of the same section as a potential 
danger in another direction). The increase might or might not be 
great enough to increase the estimated error variance of treatment 
means, but in either case it could be argued that the real amount of 
information in the experiment had been reduced because the esti¬ 
mated yield of the treatments affected had been tQ some degree 
biased. The argument was unorthodox, but its validity might be 
defended by submitting that the facts to be dealt with departed 
from the axioms of statistical theory. When an isolated major 
error affected a group of treatments to a similar degree, although 
the affected group might be randomly selected out of the total number 
of treatments, yet the fact that it was a group removed the case from 
conformity to the theory of errors in which a large number of small 
errors was supposed to operate at random on each subject individually. 
The problem of rejecting data was thorny, but when ail obvious 
pocket of low yield due to some abnormal condition occurred in 
just one block, it might be well to reject that block. Might not a 
statistician, finding a block so variable that it added nothing to the 
precision of the experiment, be allowed to invite the field worker to 
judge whether its character might have been due to fair sampling 
error or to some extraneous cause which should not have entered 
into the experiment ? 

There seemed to be a growing tendency in tables of analysis of 
variance to designate the quotient of sums of squares by degrees of 
freedom as £ ‘ variance/’ Although this was irrelevant to the subject- 
matter of the paper, he might perhaps use the occasion to test the 
feeling of Fellows of the Society on this point of terminology. No 
difficulty would be created for statisticians by some variation in the 
headings of a standard table, but the use of the word <fc variance ” 
here might perhaps confuse private students, and also make it harder 
to explain a table of analysis of variance to those who had not made 
a special study of statistical methods. The magnitude of this item 
greatly depended on the number of observations in each class or sub¬ 
class, and it was not, he submitted, properly a variance at all. It was 
(in the notation of Fisher, “ Statistical Methods,” Table 39) an esti¬ 
mate of &A -r B, where k is the number of observations in each class, 
A is the variance between population means of classes and B is the 
variance within classes. He thought that the presentation of the 
subject could be made clearer, and some confusion avoided, by 
reserving the term “ variance ” (when used with reference to variation 
between classes) to designate A , and retaining for the item under 
discussion the term “ mean square.” He would be grateful for a 
ruling on this point. 

In conclusion the speaker would like to compliment Mr. Bartlett 
for his work on transformations, which has been only lightly touched 
upon in the last section. He had had some practical experience 
with the logarithmic transformation and its utility for some purposes 
was exceedingly impressive. For example, in an investigation on 
growth of forest timber it served a double purpose, since it both 



1937] 


o)) Mr. Bartlett's Paper. 


181 


rendered the growth curve approximately linear and equalized the 
variances at all ages from seedlings to full-grown trees. From other 
evidence he believed that similar conditions would be found over a 
surprisingly wide range of observations on plant growth. The subject 
of transformations seemed likely to repay study by anyone who 
had to conduct statistical analysis of biological data. 

Mr. M. S. Bartlett, replying, said : I will take advantage of the 
Chairman’s invitation to reply at length in the Supplement, and 
simply say that I am grateful for the very favourable reception given 
to the paper, and wish to thank all those who have contributed 
to the discussion. 

Perhaps there are one or two points I might usefully make now. 
First, Mr. Gosset was naturally interested in the results of the 
cotton experiments as well as in their statistical aspects, and it pos¬ 
sibly appeared rather one-sided for me to refer simply to their 
statistical side, but I felt that that was after all my province. The 
cotton experiments are the work of the Research Scheme in which 
Dr. Crowther took part, and have been dealt with very adequately 
and fully in the papers which have been published by the Royal 
Agricultural Society of Egypt. The experiments were of the complex 
type, as commented on by Dr. Wishart; and there again, a fuller 
explanation of why these particular layouts were adopted might be 
found in these other publications. Naturally consideration was 
given to various possibilities, and in this connection I should point 
out that in Egypt hand labour is adopted for sowing and harvesting 
the plots—which made it quite practicable to conduct a number of 
fairly large experiments on representative sites, and investigate the 
growth of the plants on those sites as fully as possible. 

With reference to Dr. Irwin’s query, the problem of several missing 
cows could of course theoretically be solved as Dr. Irwin suggests, 
but this would be very laborious, and one would usually be content 
with the approximate test, and estimate the values for the missing 
cows in the ordinary way. Dr. Irwin had also referred to the use of 
regression of nitrogen effect on pH values in Dr. Lewis’s experiment, 
so that I ought in fairness to remind him of the value in such problems 
of co-operation with the experimenter. In this case Dr. Lewis had 
come along and said “ Look at the correlation with pH. What 
about it ? ” 

The point raised by Mr. Fairfield Smith on variance is one that 
perhaps Professor Fisher should be most competent to answer; but 
it seems useful to think of the variation present in the data and to 
recognize that the analysis of variance is usually the best method of 
separating out that variation into different parts, each contributing 
to the total variation observed. 

Mr, Bartlett later wrote as follows :— 

I think I might conveniently deal again with Mr. Fairfield Smith’s 
query about variance at once, for I rather misunderstood his spoken 
comments at the meeting. Mv impression then had been that he was 
questioning the use of “ variance ” as applied to samples, in distinc- 



182 


Discussion 


[No. 2, 


tion to populations. To this I saw no objection; in the same dual 
way we speak of means or correlation coefficients. But actually he 
was referring to the use of 44 variance ” to designate in an analysis of 
variance table an item which real differences between groups have 
greatly inflated. In such cases I entirely agree that the term 
“ mean square ” is preferable, and believe that my original use of 
44 variance ” in the paper in some of the tables was sadly at variance 
(sic) with orthodox Fisherian terminology. I have accordingly 
corrected this retrospectively, and am grateful to Mr. Fairfield Smith 
for raising this point. 

Dr. Crowther, writing from Egypt, asks me to thank Mr. Gosset 
for his appreciative remarks on the cotton experiments. His further 
comments are as follows :— 

44 Regarding Mr. Gosset’s query whether change in the form of the 
nitrogenous fertilizer was an important factor, the point is a delicate 
one because of commercial interests in the different products, but I 
do not think that type of nitrogen was of primary importance. The 
Chemical Section of the Egyptian Ministry of Agriculture has ex¬ 
pressed the opinion that there is no immediate difference for the cotton 
crop whether the nitrogen is supplied as Nitrate of Soda, Calcium 
Nitrate, Nitrochalk or Sulphate of Ammonia. 

I consider the most important factor determining the increase in 
yield from added nitrogen in recent years to be not spacing but 
variety , the medium-staple varieties now popular producing larger 
increases than the long-staple varieties popular a few years ago. 
Spacing follows next in importance after variety. Bulletins 
published recently give further evidence confirming these con¬ 
clusions.” 

In reply to Dr. Irwin’s comments about the theory of interactions, 
I am inclined to sympathize with Mr. Cochran’s method of demon¬ 
stration ! A formal proof of some of the corresponding algebraic 
identities would be of interest; but it seems to me that any such proof 
would be somewhat incomplete and less instructive unless constants 
were inserted to represent postulated treatment effects. Formal 
proofs in which all treatment effects are put equal to zero would tend 
to have a certain air of unreality about them. Dr. Irwin has in mind, 
I take it, the idea of elementary proofs suitable for various grades of 
student. From the mathematical point of view, some kind of 
44 shorthand ” symbolism can usually make such a problem easier. 
Thus a good deal of the technique of analysis of variance is conveni¬ 
ently represented by use of vector notation. For the case here of 
several interrelated classifications, a wider kind of tensor algebra 
might prove most powerful. 

Dr. Wishart noted an inconsistency in sign in my multiplying 
scheme, on p. 160, which I have now put right in order to avoid possible 
confusion. Perhaps I should explain that the coefficients — 3, — 1, 
1, 3, are proportional to the values of the 44 independent variate ” 
x — x in a regression problem when x is given the step-values 0, 1, 
2,3. The main manuring regression effect is obtained by multiplying 
the corresponding manuring totals by these coefficients, squaring the 
value obtained, and dividing by 16 (3 2 + l 2 + l 2 + 3 2 ) = 16 x 20, 



1937] 


on Mr. Bcutletfs Paper . 


183 


giving 89,111 as in Table XVII (the additional dividing factor 16 
arises from the sixteen plots contributing to each manuring total). 

Similarly for the spacing regression (the signs in 3, 1, — 1, — 3, 
are reversed for convenience to give a positive regression). For the 
interaction, the individual entries in Table XYI are multiplied by the 
coefficients in the multiplying scheme given, the resulting figure 
being squared, and divided by 4{(9 2 + 3 2 + 3 2 + 9 2 ) + (3 2 + l 2 -|- . . .) 
-r- . . .} = 4(3 2 + l 2 + l 2 + 3 2 )(3 2 + l 2 + l 2 + 3 2 ) = 4 x 20 2 , giving 
7,208, as in Table XVII (the additional dividing factor 4 arises from 
the four plots contributing to each separate treatment total). 

Dr. Wishart’s Latin square experiment with dairy cows was an 
interesting extension of the change-over type of trial, though (apart 
from the unsuitability of short change-over periods in some cases, as 
noted in the paper), I should be rather doubtful of using such a Latin 
square design with only one cow per group owing to the natural 
hazards. Dr. Wishart himself admits this, and suggests that more 
than one cow per group would be an improvement. 

Of the many interesting points made by the different speakers, the 
only remaining one which seems to invite comment is Mr. Fairfield 
Smith’s problem of a damaged patch or block in a field experiment. 
As in the problem of salty patches in the Egyptian experiments, the 
precise nature of the information actually supplied by the results 
needs considering, and I do not think anyone would cavil at Mr. 
Fairfield Smith’s own suggestions. The only general remark I would 
add is that the occasional need for a specially adapted analysis should 
not reflect adversely on the more orthodox one. Statistical tests and 
analyses range from those that are rarely applicable to those that are 
rarely inapplicable. The analysis of variance technique belongs to 
the latter group, and no exceptional instances will make us forget its 
consequent value. 



184 


[No. 2, 


Examples op Statistical Methods in 
Fore&t Products Research. 

By E. D. tax Rest, B.A., B.Sc. 

(Forest Products Research Laboratory.) 

[Bead before the Industrial and Agricultural Research Section of the Royal 
Statistical Society, May 27th, 1937, Sm Roy L. Robinson, O.B.E., in 
the Chair.] 

CONTEXTS. 


PAGE 

I. The work of the Forest Products Research Laboratory . . . 184 

II. Variability of v.ood as it concerns (a) the user, (b) the experi¬ 
menter on wood ......... 184 

III. Examples.187 

IV. Some problems involving statistical reasoning .... 200 


I. The Work of the Forest Products Research Laboratory. 

Wood is such ail adaptable material, being strong for its weight, 
easily worked, nailed and bent, that it is not surprising that it plays 
a large part in the things we use. Nor is it surprising that such a 
universal material should tend to be overlooked or neglected in 
comparison with newer materials. Among its numerous uses are 
structural work, both temporary and permanent, house-building, 
furniture, floors, panelling, ship- and aeroplane-building, docks and 
sluice-gates, vats, railway sleepers, railway rolling-stock, printing- 
rollers, printing-blocks, crates, boxes, paper, cellulose and all cellulose 
products such as paints, cellophane and explosives. 

About 95 per cent, of the timber we use is imported, some in a 
form ready for use, some “ unmanufactured .' 5 In 1933 the quantity 
of unmanufactured timber imported was over 708 million cubic feet, 
valued at nearly £38 million; the total of all wood used in the L T nited 
Kingdom during the same year was £44 million. 

With these extensive and varied uses problems in the use of wood 
are bound to arise, and it is the purpose of the Forest Products 
Research Laboratory to study these problems in order to assist the 
user. 

Although it is the purpose of this paper to describe how statistical 
methods help in this work, some description of the material itself will 
help in understanding the problems, and this I propose to give with 
the aid of some lantern slides and a short film. 

II. Variability. 

The Laboratory is required to answer enquiries and provide 
information about a material which is essentially variable. In this 





1937] Statistical Methods in Forest Products Research . 


185 


there is obviously scope for statistical reasoning and methods. Before 
we can apply these methods, however, it is essential to consider how 
variability of the material will affect the user. Its effect will vary in 
each case, but in general it may be said that variability affects the 
user by making him uncertain of the properties of an individual 
piece which he proposes to use. Experience leads the user to believe 
that certain species provide wood having properties within the range 
he desires. He cannot, however, assume that any particular piece 
will have at least the average value of that property. He must fix a 
lower value to get one which will almost always be exceeded (a “ safe ” 
value). The average value is multiplied by a safety factor. In so 
far as this factor deals with variation in the material (and not with 
doubt as to the actual conditions under which the piece is to be used), 
it is a function * of the distribution of the property in the material, 
and should be based on a study of variation. None of the factors of 
safety in use for wood is based on any such study. It is obvious 
that for the marketing of a timber hitherto unknown such a study 
will provide reliable factors of safety, more reliable than those 
provided by experience, and more easily controlled in the sense that 
they can be changed as changes occur in the quality of wood coming 
on to the market. Factors of safety are usually associated with 
strength, but similar factors are required for a great many other 
properties. The allowance to be made for shrinkage, the amount of 
creosote necessary to give a certain life, the setting of knives on 
machines used for cutting wood—these are all variable, and limits 
are required. This can be illustrated by reference to the last example, 
the machining of wood. Some woods, probably because of their 
strength properties, do not machine so readily nor so well as others. 
The most common faults are that pieces are chipped out or the 
surface is left woolly. Theoretical investigation 1 suggested and 
experiment 2 has shown that such difficulties can be overcome by 
changing the angle at which the cutting knives are presented to the 
wood. In extreme cases this change brings other difficulties, so that 
it is important to limit the angle to that necessary to produce a good 
surface on. say, 90 per cent, of the surface passing through the 
machine. A study of the variation of the particular strength property 
which necessitates the change of angle w r ould enable the limiting 
angle to be chosen. 


* For instance, for a normally distributed property 'with mean M and 
variance s 2 the limiting “ safe ” value L (high or low according to the property 
under consideration) is 


L «= M -4- ai = J/(l + 
So that the factor of safety = (l 



186 


vax Rest — Examples of Statistical Methods [No. 2, 


We conclude that the user is, in the case of almost every property, 
interested in the distribution of that property, and not only in its 
average value; in fact, he is hardly ever interested in the average 
value itself, but in one or other of the two limits within which a 
cert ain large fraction of the population of values may be expected 
to lie. 

It is necessary frequently to insist on this conception of a dis¬ 
tribution, for many experimenters are still prone to believe in the 
existence of a single true value for the property they are measuring 
and to which their averages are an approach. The fact that the 
average is an important estimate of a parameter used to describe a 
population helps this illusion, for it appears to the experimenter that 
the statistician is after the same thing as himself. Once the experi¬ 
menter realizes that he is dealing with a real distribution of quantities, 
he should welcome statistical methods, because they enable him to 
perform operations with distributions which he is accustomed to 
perform with the aid of arithmetic on single quantities. 

Turning to the effect of variability as it concerns the experimenter 
on wood, we find there are two classes of problem which confront 
him. One is to describe populations of which samples are provided— 
for example, descriptions of one botanical species; the other is to 
carry out trials of treatments on wood. The second of these problems 
corresponds to the use of the soil as a medium for agricultural ex¬ 
periments. The division is a convenient one, because the two classes 
of problem involve slightly different conceptions even though the 
methods used are the same. 

In the first class of problem, populations are to be described by 
the use of a few parameters, and limits are to be calculated which 
exclude certain fractions of those populations. In addition, the 
parameters describing the populations must be estimated from 
samples, and particular care must be taken that the samples provide 
true estimates of the parameters of the parent population. Timber 
is usually bought by species; but very few species are localized in 
their place of growth, so that we may have timbers grown under 
widely different conditions being sold under the same name and, 
therefore, required to be described by the same parameters. Not all 
the sources of any one species can be sampled, so that we have the 
impossible situation of desiring to sample a population all the 
individuals of which are not fully and equally available. This problem 
is discussed later in the paper and recommendations are made. 

In the second class of problem, in which wood is a medium only, 
the various treatments have to be compared as precisely as possible; 
that is, with the least possible interference from the variability of the 
medium. The most precise way of comparing treatments is to use 



1937] 


in Forest Products Research . 


187 


exactly similar pieces of wood for all the treatments. This is im¬ 
possible, so. since we often cannot measure before the treatments are 
applied the variation among the pieces used, we must content our¬ 
selves with estimating the variation from repetitions of the same 
treatment. This estimate will only be a valid one if the differences 
between pieces chosen for repetitions are representative of differences 
between the pieces used for different treatments. The only way to 
make the one set representative of the other is to choose them at 
random. The distributions of estimates made from random samples 
are known, whereas those from a systematically chosen sample are 
peculiar to the material and cannot be studied in a general way. This 
is the reason for the element of randomization to be found in well- 
designed experiments. 

Systematic arrangements are, however, not entirely ruled out, 
since it is often possible to ensure that all the variation in the material 
does not enter into the comparison; for example, in the randomized- 
block arrangement the differences between blocks are not included in 
the error. The element of randomization is then only required where 
the error is estimated—that is, within blocks. In Latin square 
arrangements the system consists of making each row and column a 
complete set of treatments. The error then consists of those differ¬ 
ences which exist apart from row and column differences; it is, 
therefore, essential for the estimation of error that, subject to the 
system, the arrangement of treatments within the rows and columns 
should be at random. The advantage of arrangements such as these 
is that they enable the comparison of treatments to be made over a 
wide range of material, yet keep the actual error of comparison low. 

III. Examples. 

As the examples which follow show, there is hardly a statistical 
method wdiich does not find some application in our work. I have 
said that our problems fall into two classes: the description of 
populations and the use of a variable material as a medium for 
treatments ; and it will be convenient to keep that classification for 
the examples. 

A. Description of Populations . 

(a) Measurements for Diagnostic Purposes . 

One of the earliest applications of statistical method in this 
Laboratory was to the problem of identification. The problem is an 
important one both to the user and the seller, since closely related 
species which are not equivalent in properties may grow together in 
the forest, be extracted together and sent mixed to the market, 
where the resultant large variability may reduce the value of the 



188 


van Rest— Examples of Statistical Methods [No. 2, 


consignment. Canadian liard and soft maple are two such closely 
related species; the inclusion of even a few hoards of the soft species 
in a floor would spoil the floor as a whole because of the uneven wear. 
The foliage and fruit are the basis of the classification, so that once 
the log is converted to boards there is no certain clue to the species. 
It is only by examining a large number of pieces that certain features 
of the structure become known as characteristic of certain timbers. 
While the majority of timbers can be identified by an accumulation 
of such characteristic features, it sometimes happens, as in the 
example quoted, that the woods of two species are distinguishable 
only by the sizes of their structural elements. The variation in this 
measurement to be expected within the species becomes important 
if such measurements are to be used diagnostically. The principles 
underlying this use have been discussed fully by Rendle and Clarke. 3 
They involve the estimation of variance and choice of number of 
measurements, so that the variance of the arithmetic means may be 
low enough to give a significant separation to the two species. The 
problem is complicated by the fact that a single piece of wood cannot 
be called a random sample of the elements of that species. The fact 
that the elements are adjacent to one another in the growing tree 
gives them a certain dependence, so that until a fair number of pieces 
from different trees and from different parts of trees have been 
examined, the variation to be expected is not fully known. 

(b) Examples of Poisson Distributions. 

Many woods are characterized by the grouping of the vessels (the 
vessels are the large pores which can be seen on the cross section). 
In making an examination of three woods for which an additional 
diagnostic feature was required, it was found that a simple trans¬ 
formation made the distribution of group sizes describable by a 
Poisson series. The expected and actual frequencies are set out in 
Table I. 

Table I. 

Frequency of Occurrence of Different-sized Groups of Vessels in the 
Mood of Shorea leprosula. 


No. of Vessels in 1 

liruui>. 

i 

Frequency. 

Fypocted 

riequenc;. 

1 i 

843 

838*5 

2 

143 

151*9 

3 

18 

13*7 

4 

1 

0*8 


The discrepancies are measured by y 2 = 1*94, for which, there 
being one degree of freedom, the probability is greater than o*i. 



1937 ] 


in Forest Products Research. 


189 


The class “ zero vessels per group ” does not, of course, exist, but 
the simple transformation of n to n — 1 enables a Poisson series to 
be fitted. 

The same transformation was used 4 to fit a Poisson series to the 
grouping of eggs laid by the Lyctus beetle. Two species L. hrunneus 
and L. hneans gave means differing by unity. These distributions 
may conceivably prove of use as diagnostic features or, in the case 
of the eggs, for interpretation of differences in fertility under differing 
conditions. 


(c) The Minimum Radius of Bending. 


It has been found useful to express the bending quality of a 
timber as the maximum curvature of the form around which a piece 
of wood i inch thick can be bent without failure. This maximum 
curvature could be determined by bending a large number of pieces 
to diminishing radii until the critical value was reached. This would 
be a lengthy process and subject to an undetermined error on account 
of variation in the material. Stevens 5 uses measurements of maxi¬ 
mum tensile and compressive strength to calculate a maximum 

\ .I. q \ T M C' 

curvature, ^ ^ (if bent with a strap), ^ 

(if bent without a strap), where T and C are the maximum strains in 
tension and compression respectively, and G' is the strain in com¬ 
pression corresponding to the maximum strain in tension; S is the 
thickness (taken as i inch for comparative purposes). Fair agree¬ 
ment has been obtained between the calculated and observed minimum 
radii for bending without a strap, and the discrepancies for bending 
with a strap are consistent enough to make the figure a fair index of 
bending quality. There is required, however, an estimate of the 
variance of the calculated minimum radius of bending, so that differ¬ 
ences between timbers may be accorded their proper significance and 
the number of compression and tension tests to suit the degree of 
differentiation required may be chosen. Gearv 6 gives the frequency 
distribution of the quotient of two normal variates by transforming 
the quotient. 


If 


b— y 
a -j- x 3 


where x and y have a correlation coefficient r, 


variances a 2 and p 2 and the means of x and y are respectively zero, 

then t — ■■■ T- aZ ~r~zr. - = is normallv distributed about zero 

V (a 2 2 2 - 2rxfr + P 2 ) 


as mean with unit standard deviation, provided that (a -4- a*) is un¬ 
likely to be negative. Table II shows the necessary data for maple 
wood when bent with a strap. 



190 


van Rest — Example* of Statistical Methods [No. 2, 


Table II. 


Maximum Strains of Steamed Maple Wood . 


In Tension (T). 

In Compression (O- 

0-0173 

0-2114 

0*0156 

0-2342 

0-0152 | 

0*3256 

0-0180 

0-2199 

0-0159 

0-2628 

0-0152 

0-3428 

0-0103 

0-2314 

0-0138 

0-2628 


Each compression piece is matched to the tension piece of the 
same line. 


x = I — C 


y=T+C 

— — 2765-25 1<H 

a = 1 - C= 7386-375 1CH 


r = — 1*0 (because of the smallness of T compared with C) 

a 2 = 237-337.10- 8 . a = 487*18.10- 4 

P 2 = 235-433.10- 8 . p = 485-21.1(H 

To get the 5 per cent, points of z put t = 1-96. 

The solution of the resulting quadratic gives the radius of curva¬ 
ture R = 1-73 inches or 4-59 inches. These are respectively the 
lower and upper limits within which the minimum radius of 95 per 
cent, of pieces of that species will lie if the variation is estimated 
from the eight observations made. In such a way, but using more 
extensive observations, the user is enabled to decide from simple 
tension and compression tests whether a timber is suitable for his 
purpose. 


(d) Liability to Attack * by Lyctus spp. 

The Lyctus Powder-post beetles are the cause of serious loss to 
many trades in this country, notably those manufacturing furniture, 
walking-sticks, tool-handles, tennis racquets, flooring- and paving- 
blocks, aircraft, railway rolling-stock and rifles. 

The eggs of Lyctus are laid in the vessels of the sapwood of certain 
newly seasoned hardwoods—for example, oak, ash, elm and walnut— 
and hatch in 8-12 days. The larvae tunnel in the direction of the 
grain, reducing the wood to a fine powder from which they derive the 
name 4 * Powder-post.’* Since the eggs are laid always in vessels, 
Fisher and Clarke 7 advanced the theory that liability to attack is 



1937 ] 


m Forest Products Research, 


191 


based on vessel size Fig 1 gives tbe distributions of vessel diameters 
for several timbers, and tbe average (Y) and smallest (X) recorded 



diameter of tbe eggs of Lyctus brunneus According to this theory, 
sycamore, among others, should not be attacked at all, while widow 
might be attacked very raiely This was in accordance with experi- 


6 so too iso 200 250 500 55o 40 0 50 100 ISO 200 250 500 550 

3?ia 1 .—Curves shotting the Distributions and Observed Ranges of Vessel Diameter in Several Woods 
XX is the min i m um and YY the average diameter lecorded foi the egg of L brunneus A A is the minimum and 

BB the average diameter recoided for the ovipositoi of L btunneus 











192 


van Rest — Examples of Statistical Methods [No. 2, 


ence at tlie time, but cases of Lyctus attack in these two woods were 
later reported. Parkin , 8 however, was able to show that the egg is 
capable of having its normal dimension altered in order to fit into a 
restricted space. The limiting factor for liability to attack is then 
the size of the ovipositor, since this is inserted into the vessel in which 
the egg is to be laid. The lines AA, BB in Fig. 1 indicate the average 
and minimum recorded diameters of the ovipositor of Lyctus brunneus , 
and show that of the woods the vessel diameters of which were 
measured by Clarke, only Horse chestnut (Aesculus sp.) should be 
completely immune because the pores are too small. Cherry and 
apple should be attacked but rarely, since eggs can be laid in these 
only by beetles with ovipositors below the average diameter and in 
vessels above the average size. From more recent investigations 
there are indications that eggs laid in vessels with a diameter less than 
about 85 P will not hatch, so that there may be a definite limit slightly 
above that set by ovipositor size. 

Because of the peculiar method of egg-laying adopted by the 
Lyctus Powder-post beetles, it is possible for infested timber to pass 
through various manufacturing processes without the insects being 
killed or the attack noticed. Little sign of the attack may be 
apparent externally until the first of the beetles bore their way out 
some twelve months after the eggs were laid, by which time the 
timber has probably been manufactured and sold. The necessity for 
replacing such infested material is one of the main sources of loss. A 
method of recognizing the liability of timber to Lyctus attack is 
therefore valuable, especially in connection with the introduction of 
new woods. 

This example is included because statistical methods are necessary 
in the study of the three distributions, vessel size, egg size, and 
ovipositor size, especially where the parameters of the distributions 
have to be estimated from samples. 

Although these examples of distributed properties are relatively 
simple, I have devoted this space to them, because I believe it is not 
yet generally recognized that distributed quantities can be described 
and compared and that, for the properties of wood at least, the spread 
of the distribution cannot be ignored. 


B. Wood as a Medium for Treatment. 

The second class of examples is concerned with wood as a medium 
for treatments. The variability of the medium introduces uncertainty 
into comparisons of treatments, an uncertainty which can be resolved 
only by statistical reasoning. The examples are chosen to illustrate 
common statistical processes. 



1937 ] 


in Forest Products Research. 


193 


(a) Significance of a Difference. 

Timber, especially sapwood, stored in tbe open frequently becomes 
stained, and it is required to know whether this staining affects the 
strength. A number of trees were cut into planks, alternate planks 
were dipped in a preservative solution and the remainder allowed to 
develop stain naturally. Test-pieces were cut from the sapwood of 
both and measurements 9 of the fibre stress at maximum load and of 
hardness were made, with the results that appear in Table III. 


Table III. 

Effect of Sap-stain on Strength . 



Maximum Bending Strength, 
lbs./sq. in. 

Hardness on Radial 

Face, lbs. 


Stained. 

Unstained. 

Stained. 

Unstained. 

Number of tests . 

40 

100 

40 

100 

Mean . 

6,184 

6,270 

117 

132 

Sum of squares of devia¬ 
tions . 

i 

16,799,390 

30,459,499 

8,655 

27,244 

Difference of means 

85 

15 


S.D. of a difference of means 

117-6 

2-881 

t . 

0-7 

5-21 


The data thus showed that a significant change took place in 
hardness, but not in bending strength. 


(b) Regression. 

It has been noted by Clarke 10 that, on the whole, tropical timbers 
are less tough, though stronger in compression than north-temperate- 
zone timbers of the same specific gravity. A negative relation might 
be expected between the two strength properties for all timbers, 
irrespective of their classification into tropical or temperate zone, 
although the existence of a close relation of each of these properties 
to density is likely to mask this relationship unless a correction is 
made for density. The easiest way of establishing the relationship is 
to fit a two-term regression of, sav, maximum crushing strength on 
density and toughness, and to test the significance of the coefficient 
of the second term. Data were available for 294 different timbers 
from various parts of the world . 11 The squares and products necessary 
for calculation of the regression are given in Table IV, in which 
S = density X 10 3 ; M — maximum crushing strength parallel to 
the grain x 10 " 1 lbs./sq. inch, and T is the drop in inches of the 50 -lb. 
hammer used in the toughness test. 






194 


van Rest —Examples of Statistical Methods [No. 2, 


Table IV. 


Sums of Squares and Products used in Calculation 
of Regression M = as + bT. 



Total. 

Mean. 

Sams of Squares or 
Products. 

s ... 




16,520 

56-19 

51,819-4356 

M ... 

... 



129,967 

442-0646 

7,124,139-1318 

T ... 




10,756 

36-5748 

59,723*9605 

SM 




— 

— 

493,912*8080 

MT 




— 

— 

175,673*3562 

ST... 




— 

— 

29,179-8761 


Each line is the sum of 294 terms. 

The normal equations give 

*c n =+ 0-2662 X 10 - 4 c 12 = - 0-1300 X KH 

c 22 = + 0-2310 X 10 - 4 

and coefficients a = 10-864 b = — 2-3665 
in the equation M = as bT. 

The standard deviation of V is 0-346, so that t = - = 6-85, 

0*34:0 

making b certainly significant. Since b is negative, the suspected 
relationship is established. The existence of this relationship suggests 
that even for woods of the same specific gravity there is some com¬ 
ponent which increases the strength in compression at the expense 
of its toughness. Since timbers which are tough yet light in weight 
are not numerous, substitutes for the commonly used ash and hickory 
are much sought after; any advance, therefore, in the study of the 
causes of toughness is useful because of the assistance it can give in 
this search. 

(c) Analysis of Variance , including Regression. 

While a great deal of the variation in strength of any species of 
wood is explainable in terms of the botanical structure, density, 
width of the annual rings, percentage of thick-walled tissue, and so 
on, these relationships are by no means well established, nor do they 
cover all the sources of variation. In an investigation on Sitka 
spruce 12 a quantity of material carefully weeded of abnormalities was 
available on which structural measurements had been made. An 
analysis was made in order to discover some clue to the source of the 
still-unexplained variation in strength. First, several likely regression 
equations were fitted and the best of these chosen. The material 
came from several different sites, but the differences between sites 

* The notation is that of Fisher, Statistical Methods for Research Workers , 
Section 29. 














1937 ] 


in Forest Products Research. 


195 


were ignored in choosing the best regression by considering only 
deviations from means of sites. Using this best equation, the total 
variance was analysed in the form of Table V. The various items of 


Table Y. 

Analysis of Variance of Maximum Crushing Strength of some Sitka 
» Spruce, using Two-term Regression. 


Variation Ascribed to 

Sum of 
Squares. 

Degrees 
of Free¬ 
dom. 

Mean 

Square. 

z. 

Remarks. 

(a) Residuals from regressions * \ 

2.1,177 

115 

203 


: 

fitted to separate sites j 



(&) Differences between these - ) 
separate site regressions J 

3,781 

10 

378 

0-31 

Insignificant. 

(c ) ** Average ” f regression) 
fitted to separate sites J 

34,131 

2 

17,217 

- 

f Obviously signifi- 
\ cant. 

Total “ within sites ”.. 

- 61,392 

127 




( d) Regression fitted to site") 

11,938 

2 

7,479 



means / 

(e) Residuals of site means to 5 
this regression.J 

6,513 

; 

3 

2,171 

1-18 

f Significant, but only 
{ because of sites 5 
L and 7. 

Total between sites ... 

21,471 

5 




Grand Total . 

82,863 

132 





* The regressions were of the form: strength = a (specific gravity) + b (ring width), 
f The average ” regression is calculated from the pooled deviations “ within sites.” 


this table are sources of variation the significance of which is tested 
by comparison with the first item (by the z test). Examination of 
the table shows that the different sites are not different in the 
relationship between strength, density and ring width existing within 
them, but that an appreciable source of variation exists between the 
site means. A detailed analysis of item (e) of Table Y shows that sites 
5 and 7 cause practically the whole of this variation. This enabled 
attention to be confined to these two sites, to see in what respect they 
differed visibly from the others. This led to the introduction of a 
third term to the regression—namely, ring age. With the new 
regression a new analysis was made (Table YI), which showed that 
the site means still differed more than would be expected from 
variation within the site. A detailed examination of item ( e ) this 
time revealed only one aberrant site as the cause of the significantly 
large sum of squares under this item. The search was not carried 
farther. 

The establishment of general relationships between strength and 
the visible structural characteristics is of assistance to the buyer in 
enabling grading rules and acceptance specifications to be drawn up 
which could be even more rigorous than actual strength tests, since 
these, being destructive, can only be made on a restricted sample. 





196 


van Rest — Examples of Statistical Methods [No. 2, 


Table VI. 


Analysis of Variance of Maximum Crushing Strength of some Sitka 
Spruce-using a Three-term Regression. 


Variation ABcribed to 

Sum of 
Squares. 

Degrees 
of Tree- : 
dom. 

Mean 

Square. 

z. 

i 

Remarks. 

(a) Residuals from regressions*') 

21,825 

i 

109 j 

200 



fitted to separate sites ) 



(6) Differences between these 4 ) 
separate regressions J 

2,010 

15 

134 

- 

Insignificant. 

( e ) ** Average ” f regression! 

37,557 

3 

12,519 


! f Obviously sigmfi- 

fitted to separate sates j 


\ cant. 

Total “ vithin sites ” . 

61,392 

127 




(d) Regression fitted to site") 
means J 

12,423 

3 

4,141 

— 


(/) Residuals of site means to\ 
this regression .. J 

9,048 

2 

4,534 

1 

Significant. 

Total between sites ... 

21,471 

5 




Grand Total . 

82,863 

132 





* The regressions are of the form: strength — a (specific gravity) + b (ring width) + c (ring 
age). 

f The “ average ” regression is calculated from the pooled deviations “ withm sites.” 


(d) Planning of Experiments. 

Experiments involving variable material ought, of course, to be 
planned with a view to the use of statistical methods. Most import¬ 
ant is the provision for an unbiased estimate of error. The following 
two experiments have not yet been carried out, but their plans and 
the reasoning leading to their adoption have been set out in the hope 
that they will provoke discussion. The subject of experiment design 
has not received so much attention in the past as analysis, yet the 
full benefit of a statistical analysis cannot be achieved without 
planning for it. 

(i) An experiment to test the power of discrimination of the Lyctus 
beetle for starchy wood. 

It has been stated 13 that the Lyctus female prefers starchy 
wood in which to lay its eggs. Since it is possible for the starch in 
the newly felled tree to be converted, this claim, if true, would en¬ 
hance the value of starch-free wood. There are many who claim 
that the present speeding-up of the drying process does not produce 
such acceptable material as k ‘ natural ” seasoning, and it is possible 
that one of the differences between the two processes is this conversion 
of the starch, A simple piece of apparatus was designed of which the 
geometiy gave as far as possible a free choice between two pieces of 
wood to any insect inserted into it. 

If we suppose that the insects have no discriminatory sense be¬ 
tween the two kinds of wood (starchy and starch-free), we can 






1937 ] 


in Forest Products Research. 


197 


consider the consequences of that hypothesis and observe whether 
the actual results when one piece is starchy and the other starch-free 
are in accord with those consequences. If they have no discrimina¬ 
tion and N insects are placed in an apparatus where their choice is 
uninfluenced by the apparatus, then the relative frequencies with 
which o, i, 2 ... N insects will go to one piece of wood will be given 
by the terms of the expansion (J + Any particular observed 
value (M) of insects on the starchy wood can, therefore, be taken 
as inconsistent with the original hypothesis if the chance of it 
occurring is less than, say, i per cent, when the hypothesis is true. 

If the terms of (J + ^) N are calculated for various values of N, 
it will be seen that the extreme possible observation M = N is only 
reduced below a i per cent, probability when N>7. Seven is, there¬ 
fore, the least number of insects which could give a significant answer, 
on the i per cent, standard, to the question of discrimination. It 
would, however, be unwise to require that all the insects should show 
a preference in one direction before a hypothesis of no discrimination 
is rejected, so some number greater than 7 should be chosen. For 
instance, for iV" = 15 the occurrence of M = 13,14 or 15 will serve to 
reject the hypothesis. 

It might be thought better to put a larger number of insects 
through the trial, rather than use such small numbers, but there are 
several advantages in dividing the total number into groups and 
observing each group separately. These advantages are : 

(1) It is possible to neutralize any bias that may exist in the 
apparatus or its surroundings by making such alterations as the 
following in successive groups— 

(a) Exchange the positions of the two pieces of wood. 

(h) Turn the whole apparatus with respect to the light or 
other factor. 

(c) Use other pairs of pieces. 

These should be treated f actorially; that is, the side to side change 
for (a) should be made in each of the positions under ( b ) and the whole 
repeated for each pair of pieces. Thus if P pairs of pieces are used in 
L positions of the apparatus, there will be 2 x P X L trials. 

( 2 ) While the test of the hypothesis would still be made on the 
total, we should have the additional test of comparing the variance 
of group observations with their theoretical variance. That is, if 
M t is the number found on the starchy wood in the trial of N insects 

(Mi - IN ) 2 
iN 

should be distributed as x 2 for a number of degrees of freedom 
equal to the number of groups. 



van Eest —Examples of Statistical Methods [No. 2, 


198 


Whether the same few insects should be used over and over again 
is a matter for further discussion. It is possible that all the insects 
will not possess the power of discrimination to the same degree, and 
so a larger number would perhaps show greater deviations from the 
theoretical distribution than a few used several times. We are 
concerned, however, with the hypothesis that they have no power of 
discrimination, and so long as the average power of discrimination is 
well above the precision of the experiment, it will be detected. The 
argument against the use of the same insects over and over again is the 
possible formation of habit tending to give the same result for each 
trial. 

Having chosen the size of the group, it is of interest to determine 
wl&t degree of discrimination this sized group will have the precision 
to detect. It is possible that the insect's choice is modified by some 
other qualities of the wood, and this would cause it to appear that it 
had an imperfect sense of discrimination. We may define the power 
or degree of discrimination as the relative frequency of choice in one 
direction. Perfect discrimination will then be indicated by 1*0 and 
perfect lack of discrimination by 0 * 5 . 

If the least discrimination detectable by a trial is p', then, 


E*Lz ill 
WN + V(Np'q') 


2-326 


which for N = 15 gives p' — 0-94 


and for N = 120 ( = 8 X 15) gives p’ = 0*6. 

which is as low as is reasonably required, since a lower sense of dis¬ 
crimination would not be of economic importance. 

To complete the discussion on the design of this experiment, con¬ 
sider the effect of giving the insect three or more choices instead of 
two. For one piece starchy out of three the least value of p' is 
given by 

== = 2-326 

?) 

For N = 15, p f — 0*84. This is a decrease in p\ and trial shows 
that s is now the least number permissible in a group instead of 7 . 
This might be of use if it were desired to reduce the number of 
insects used. 

(ii) An experiment to compare four fire-retardant treatments on 
material having certain limitations . 

Pieces of wood treated in three different ways were to be com¬ 
pared with each other and with untreated pieces over a fair range of 
material of two species. Each species was to he tested with a rough 
and a smooth surface, six months after treatment, and within a day 
or two of treatment. 


mi - j) 
W(M) + V(Np f 



1937 ] 


in Forest Products Research. 


199 


The limitations were, first, that not more than eight pieces could 
be cut from one board, and even these would not always be adjacent, 
and second, the rough and smooth material was already cut and could 
not be matched. Although it was desired to cover this wide range 
of material, the various interactions were relatively unimportant 
compared with the treatment differences; in any case it was inevit¬ 
able that the smoothness effect should be confounded with consign¬ 
ment differences, as the sawn and planed boards were supplied from 
different sources. A design consisting of complete randomized blocks 
of eight pieces was considered. The error appropriate to such a 
design would be the variation within boards of eight pieces. In view 
of the relative unimportance of the time effect, it was thought that 
this might well be confounded with blocks in order to use a smaller 
block. 

Previous experience made it appear that even blocks of four might 
provide too large an error. A reduction of size below four involves 
the use of incomplete randomised blocks (Yates 14 ). In this method 
any reduction in variance by the use of a smaller block is partly offset 
by the fact that the block variance is not entirely removed from error, 
so that it is necessary to investigate whether the desired reduction of 
error is brought about. Yates 14 has shown that the ratio of the 
error variances of the two designs is 

Complete blocks size t __ V kj 1 
Incomplete blocks size k ~~ ^ ^ a 2 


where o t 2 and <y k 2 are the variances within blocks of size t and size h 
respectively. It can be shown that when, as is probable in the pre¬ 
sent case, a regular gradient of the property being measured exists 
along the board, 

<**“ i ~ 


so that the ratio of error variances is - 1 /. The blocks of two are 
therefore to be preferred in this instance. 

Table YII is a design using incomplete blocks of two. The six 
possible pairs of the four treatments are each repeated in each con¬ 
signment of wood; the allotment of half-boards (= two test-pieces) 
to time of test and to pair of treatments is random, as is also the allot¬ 
ment to members of pairs. 

The design was afterwards doubled in size to accommodate a test 
which uses a pair of test-pieces to produce one result. It was neces¬ 
sary to pair at random for test those pieces which were treated alike 
as well as paired on the boards with like treatments. 



200 vaH JRest — Examples of Statistical Methods [No. 

2, 

Table VII. > 

in 

Design for an Experiment to Compare Four Treatments A, B, C and Zj, g 
(Pieces marked m capitals to be tested sis months after treatment.) ^ 


Board No. 

Species A, 
Bough 

Species A, 
bmooth. 

Species B, 
Bough. 

Species B, 
Smooth. 

1 

dbBD 

c a A C 

CDcd 

dbBD 

2 

cdG D 

DBbd 

BCbc 

DBbd 

3 

Q D d c 

b a B A 

CDcd 

a c C A 

4 

bcBC 

d a D A 

C A ca 

cbCB 

5 

cbBC 

CBcb 

D A da 

D A a d 

6 

d a D A 

D A d a 

c a C A 

B Ab a 

7 

A D d a 

BCbc 

D A a d 

CBbc 

8 

c a A C 

C A a c 

B A ab 

a c A C 

9 

ca A C 

dbBD 

dbDB 

C D dc 

10 

DBbd 

dc D C 

CBcb 

baBA 

11 

A B ab 

c d D C 

BDbd 

da AD 

12 

\ abBA 

B Ab a 

A B ab 

DCcd 


By allotting the two halves of each board to the same treatments 
hut at the two times of test, a separate comparison with a different 
error variance is made possible for the time effect. The analysis of 
variance of the design is shown in Table VIII.* 

Table VIII. 


Source of Variation. 

Degrees of 
Freedom. 

Treatments. 

3 

Interactions: 


Treatments x Time . 

3 

Treatments x Consignments . 

9 

Treatments x Time x Consignments. 

9 

Error (i) . 

72 

Total within matched pairs .. 

96 

Time. 

1 

Time x Consignments . 

3 

Error (ii) . 

44 

Total between board halves . 

48 

Between boards . 

47 

Grand Total . 

191 

IV. Some Problems. 


The examples given do not by any means exhaust the need for 


statistical methods m the work of the Laboratory. Under this head¬ 
ing are given three problems in which it is evident that statistical 
methods will prove helpful. 

* The author is indebted to Mr. M. S. Bartlett for this analysis. 



in Forest Products Research . 


201 


i> 

07] 

^ (a) The difficulty in collecting information about a species is that 
a *e species can never really be sampled m a representative manner, 
^eiy few species are limited in their locality of growth, and differences 
l locality are certainly accompanied by differences in properties, 
n addition, there are large areas of forest as yet unopened, but which, 
rith growing demand, may yet be turned into sources for the markets 
and provide timber of the same species as are already on the market 
but of very different properties. The solution seems to be the making 
•of smaller classifications. H the published values of any property 
are to be used in practice, they will be the more useful the more closely 
they are based on the wood to be used. Now, each market in which 
timber is bought has its own particular sources, and it should be 
possible to sample these markets from time to time and obtain values 
with less variation than the corresponding values for the whole 
species, and therefore more useful to the user. The collection and 
publication of information classified only by species amount to the 
rejection of the information conveyed by knowledge of the locality 
of growth. 





(a) IW (c) 

Fig. 2. 

(6) Another wide field as yet unexplored is the statistical distri¬ 
bution of a property within a single test-pieoe. Since there are varia¬ 
tions in almost every property between different pieces of wood, it is 
logical to assume that there are variations within a piece. The effect 
Of these variations will be different according to the property being 
measured. In some cases—as, for example, density—the measure¬ 
ment made on a piece will be the average of the densities of the ele¬ 
ments of which the piece is composed; for some of the strength 
^j&operties, on the other hand, the measurement will not be any such 
simple function of the strengths of the elements. Consider, for in¬ 
stance, the yield-strength. The load-deflection curve for an ideal 
material might be as Fig. 2a, which shows a critical load at which the 
deflection increases indefinitely without increase of load. A test- 
piece made up of a number of elements each with a different yield- 
strength (Fig, 2b) would give an over-all load-deflection curve like 
6UPP. VOL. IV. no. 2. i 



202 van Best— Examples of Statistical Methods [No. ^ 

Fig. 2c in which the yield-point is the yield-point of the weakest 
element. 

The simple relationships which possibly exist between the pro¬ 
perties of wood will only be true for the umt which is homogeneous in 
its properties, so that it is necessary to interpret the behaviour of the 
test-piece (itself a small population of homogeneous units) in terms of 
the distribution of the properties within the test-piece. 

(c) A third problem is that of precision and severity of test. 
Tests are often made on a few articles with the object of distinguishing 
between the parent populations—for example, a few packing-cases of 
one type compared with a few of another may be the basis of a decision 
to buy thousands of one kind rather than the other. It is not enough 
to calculate by statistical methods whether or no a significant differ¬ 
ence has been shown by the trial; we ought first to consider : 

(i) What are the smallest differences of economic importance 
in the parent populations when translated into terms of a small 
sample ? 

(li) If the tests are more severe than practice, as laboratory 
tests often are, what is the smallest difference of economic im¬ 
portance when translated into terms of the more severe labora¬ 
tory test ? and 

(iii) Whether the trial is precise enough to discover the 
smallest differences of economic importance as translated in (i) 
and (ii). 

(iv) If the measurement made is a count (as in the case of 
packing-cases, the proportion of the contents damaged), what 
is the best degree of severity to choose bearing in mind that 
severity influences precision ? 

(i) and (iii) can be treated by well-founded statistical methods, but 
(ii) involves an enquiry into the exact relation to practice of the test 
made. This enquiry is not often made, yet is extremely important, 
especially in the case of accelerated tests, where the shortened time is 
brought about by increased severity. 

Summary. 

A brief description of the work and scope of the Forest Products 
Research Laboratory is given. The variability of wood is discusseJJ 
as it concerns the user and the experimenter. The problems of 
description and experiment are illustrated by examples on identi¬ 
fication, beetle attack, bending, change of strength brought about 
by sap stain, relation of strengths measured in different ways, relation 
of strength to structure. Three general problems are given, and the 
use of statistical methods in their solution is discussed. 



1937 ] 


in Forest Products Research. 


203 


Acknowledgments. 

The author is indebted to his colleagues at the Forest Products 
Research Laboratory for the data used in the examples, to Professor 
R. A. Fisher and Professor E. S. Pearson for assistance from time to 
time, to Dr. J. Wishart and Mr. M. S. Bartlett for valuable criticism, 
and to Mr. W. A. Robertson, Director of Forest Products Research, 
for permission to publish. 


Literature. 

1 W. W. Barkas, E. D. van Rest and W. E. Wilson, Forest Products Research 
Bulletin 13, 1932. 

* P. Harris, Bmp. For. J., 1934, 13, 63. 

8 B. J. Rendle and S. H. Clarke, Tropical Woods , 1934, 38, 1; 40, 27. 

4 E. A. Parkin and E. D. van Rest, Nature , 1933, 132, 445. 

5 W. C. Stevens, Forest Products Research Record, No. 10, 1936. 

6 R. C. Geary, J. Roy. Stat. Soc., 1930, 93, 442. 

7 R. C. Fisher and S. H. Clarke, Forest Products Research Bulletin 2. 

8 E. A. Parkin, Ann. App. Biol., 1934, 21, 495. 

9 W. P. K. Findlay and C. B. Pettifor, Forest Products Research (unpublished). 
18 S. H. Clarke, Nature, 1937, 139, 511. 

11 V. D. Limaye, Indian Forest Records, 1933, 18, Part X; Markwardt and 

Wilson, Tech. Bull. U.S.D. Agric., No. 479, 1935; Forest Products 
Research Laboratory, unpublished strength data. 

12 E. W. J. Phillips, Forest Products Research Project 18, Progress Report, 

No. 6, 1936. 

18 S. E. Wilson, Ann. App. Biol., 1933, 20. 

14 F. Yates, Ann. Eugenics, 1936, 7, 121. 


Discussion on Mr. van Rest’s Paper, 

The Chairman : It is with very great pleasure that I move a 
vote of thanks to the reader of the paper. 

It is difficult for those who, like myself, are not mathematicians, 
to follow the mathematical parts of Mr. van Rest’s paper. I can, 
however, speak from first-hand knowledge of the advantages which 
accrue from methods of analysis such as Mr. van Rest has described. 

The Forestry Commissioners have been engaged for seventeen 
years or so in research into biological subjects such as the production 
of timber by various species under various conditions, and at first 
we were content just to make experiments and be convinced by the 
results. But the Research Officers soon found that people in the 
field had made their own experiments and come to quite different 
conclusions. That led us to consider the whole procedure in re¬ 
search, and in due course we came down to methods designed to 
test the significance of results. That procedure has given the r &- 
search people greater confidence in their own results, and I think 
has, after several years, come to be recognized by the field officers 
as really valuable. 

From recognizing the necessity of applying statistical method to 
the data obtained by experiment, it was only a short step to apply¬ 
ing the same reasoning to the design of experiments, which is equally 
important, particularly with experiments such as we have designed 



204 


Discussion 


[No. 2, 


to run on perhaps twenty, thirty or forty years. It is no use carry¬ 
ing out an experiment for forty years and then finding it was so 
designed that in the long run it tells nothing. 

I myself, in my connection with the Forest Products Board, 
have been struck by the desirability of applying the same methods 
to timber problems, for example, timber testing and other problems 
such as Mr. van Rest has indicated, and I heartily congratulate 
him on bringing to bear these methods of statistical analysis. I 
suppose timber is one of the most variable of the materials used in 
the raw state, and I feel that the results of tests of strength and other 
properties of timber, have frequently been put out in such a way 
as to cause much waste in the design of structures. I hope there¬ 
fore that the Forest Products Laboratory and Mr. Van Rest will 
persevere with the procedure which he has outlined this evening. 

I have much pleasure in asking Mr. Bartlett to second the vote 
of thanks. 

Mr. Bartlett : In seconding the vote of thanks to Mr. van 
Rest, I cannot claim to speak with authority in his particular sphere 
of work, and after the Chairman I speak with some diffidence. I 
do, however, welcome this opportunity of learning something of 
the problems dealt with by the Forest Products Research Laboratory, 
and I congratulate Mr. van Rest on the able manner in which he 
has brought these problems before us, and on the welcome innovation 
of the film. 

I feel more at home with the statistics of the paper. I liked the 
author’s early recognition that the use of statistical methods 
essentially depended on the inevitable variation in our population, 
whatever that population might refer to. I do not think that 
Mr. van Rest really meant to belittle the use of a statistical average, 
which might be the best single measure available. Certainly the 
statistical average by itself is hardly ever enough, and various firms 
seem to be beginning to appreciate this. One well-known firm in 
Regent Street advertises that it can supply us with shirts with a 
choice of lengths of sleeve I 

The application in the paper of the Poisson distribution interested 
me. The discussion of the problem was perhaps a little brief. If 
a frequency distribution has no theoretical zero class, such a class 
may not be conceivable, as Mr. van Rest assumed in his example. 
On first reading the paper, I thought that possibly the class was 
conceivable, but not observable or else simply neglected. I am not 
so sure now, especially after seeing his slide showing the well-defined 
structure of the groups (and the relationship between the number 
n of vessels in a group and the number n — 1 of dividing bars 
developed), about the relevance of my alternative in this particular 
problem; but it seems worth considering. 

If a Poisson distribution has for any reason the zero class missing, 
the equation of estimation for the mean m will be 




1937} 


on Mr. van Rest's Paper. 


205 


where x is the observed sample mean. This equation can easily 
be solved by some iterative method. For the present example, 
x = r 1811 , m is estimated to be 0 * 34268 , and the expected frequencies 
are 


(0 . 2458*8) 

1 . 842*6} 


2 .. 

3 .. 

4 or over 


^• g ' 1005 ' 0 

l-5j 


evidently giving a very good fit. 

I have very much pleasure in seconding the vote of thanks to 
Mr. van Rest. 


Mr. W. A. Robertson felt he could add little to the discussion 
itself, but he could say for the laboratory that they were beginning 
to realize that they had a very powerful weapon in statistical 
methods for attacking their problems. 

Mr. van Rest had had a hard task to convince his colleagues of 
the power of this new weapon, but he was doing so, and would 
succeed in the end. Although they had to state certain of their 
results in single numbers, as apart from ranges, this latter aspect 
was bound to come more and more into their work as they passed 
from the preparation of simple tables for the use of construction 
to real research problems. Mr. van Rest had done excellent work 
in tackling this subject in the light of Forest Products requirements, 
and for that he was very grateful. 

Professor Pearson wished to add his congratulations to those 
of other speakers, to Mr. van Rest and also, incidentally, to those 
of his colleagues at the Forest Products Research Station who, in 
the last few years, had been working very hard with him to obtain 
recognition of the value of statistical methods in their work there. 

The paper was interesting not so much for the introduction of 
new methods, as in showing how in a research laboratory there were 
numberless problems calling for an attitude of statistical-minded- 
ness: these were often relatively simple, but in handling them 
and getting the most useful conclusions from the data, one must 
have some appreciation of statistical method. It was satisfactory 
to find that the recognition of this fact was spreading in the research 
laboratories in this country. 

Professor Pearson concluded with some remarks on the proposed 
experiment with the Lyctus beetle and also referred to the importance 
of the type of problem mentioned by Mr van Rest on p. 202 , namely, 
the closeness of the relation between special experimental tests of 
strength, endurance, etc., and quality in subsequent commercial use. 

Mr. Clarke said Mr. van Rest had stated that the variability 
of timber was an important reason for the existence of the Forest 
Products Research Laboratory, and for the same reason statistical 
methods were an essential part of the Laboratory’s technique. He 



206 Discussion [No. 2, 

wished to show how these methods had been of value in one section 
of the Laboratory. 

One example was in the identification of timbers. There were 
about 20,000 different species of timber, and no two pieces of any 
one kind were alike. The Laboratory had each year to identify 
several hundred specimens from trade and other sources. When 
one was faced with 20,000 species, each capable of varying itself 
in some 50 or 60 properties, it was necessary to study the over¬ 
lapping of the variations. In the last four or five years a system 
has been built up which considerably shortened the time spent in 
identification. The basis of the system had been a statistical 
analysis of the variation of timbers. 

Statistical method had also helped in tracing the way in which 
variations in the structure of timber were linked up with variations 
in properties. Nearly all the properties of timber were related to 
specific gravity. Years ago it was discovered that strength pro¬ 
perties were related to specific gravity, but further progress had been 
difficult because the influence of specific gravity masked other 
relations. It was not easy to obtain a hundred pieces of timber 
of a single specific gravity and then study other causes of variation 
in strength. The only approach was the statistical way. The 
calculation of regressions with several terms had greatly helped 
to indicate the relative importance of factors influencing the strength. 
Mr. van Rest had mentioned that hickory and ash were of importance 
in the timber trade on account of their unusually high strength in 
resistance to shock; the slide he had shown illustrated the inter¬ 
relation of the properties of timber and was an example of the use 
of statistical method in studying the properties which rendered 
certain timbers especially suitable for particular work. 

It was possible to deal statistically with three or more variables, 
the relations of which would require more than three dimensions 
to reptesent graphically, and which could hardly be conceived 
mentally. Since Mr. van Rest had brought certain methods to the 
notice of the section, advance in some directions had been quite 
four times as rapid as before. 

Mr. Tippett congratulated Mr. van Rest on his paper, 
which testified to his excellence as an exponent of statistical 
methods. 

He would like to underline what Mr. van Rest had said on p. 186. 
“ It is necessary frequently to insist on this conception of a dis¬ 
tribution, for many experimenters are still prone to believe in the 
existence of a single true value for the property they are measur¬ 
ing.” He would go further, and say it was because experimenters 
were prone to such a belief that they had to be converted, often 
with considerable labour, to the use of statistical methods. This 
was because of a defect in the system of education of most scientists— 
chemists, physicists, engineers and even biologists. They were 
not taught elementary statistical ideas. This Section of the Royal 
Statistical Society was working for the application of statistical 
methods to industry, and in its propaganda was attacking firms 



1937] 


on Mr. van Rest’s Payer. 


207 


and industrialists; lie would suggest that it might turn its attention 
to University authorities and get statistics put into the curriculum 
for all first-year scientific students. 

His second point was more statistical. On p. 185 Mr. van Rest 
referred to the “ factor of safety.” All the timber in a batch must 
not be regarded as having an average strength; allowance must be 
made for the weakest specimen. In a footnote he related that 
factor to parameters of the normal distribution, although probably 
only as an example. Mr. Tippett thought there was a considerable 
amount of difficulty here. Any relation of the frequency of 
abnormally weak specimens of timber to the parameters of a normal 
or any other form of distribution depended upon the distribution 
being homogeneous. It might be of a normal or some other standard 
form around the average, but these abnormally weak specimens 
might not be part of that population but of another, quite separate 
population. Experiments or investigations that were done to relate 
the factor of safety to such a distribution, would have to be done 
on a colossal scale. For example, if it were decided that the factor 
of safety must be large enough to make the probability of failure 
of an individual specimen one part in a million, it would be necessary 
to do a test with several million specimens to determine the factor, 
and there was no sound short cut. For this reason he thought that 
perhaps they might in practice have to rely upon large-scale 
experience, rather than laboratory tests for these factors of safety, 
as the laboratory tests could only be done on a limited number of 
observations. 

Miss Petttfor expressed her appreciation of Mr. van Rest's 
paper. To illustrate another valuable application of statistics, 
when different kinds of fungus attacked the timber, one effect was 
often a decrease of strength depending upon the period and extent 
of the fungus attack. In carrying out series of experiments it was 
desirable to have groups of test pieces which could be exposed to 
the fungus for varying periods. Before statistical methods were 
used the timber was taken in certain groups, each group having its 
own separate controls. Timber was so variable that the control 
groups could not be matched and the variation was so great that 
the figures had to be juggled before conclusions could be drawn. 
Now that the Latin square formation was used, they could match 
the different groups together with the controls and get very accurate 
comparisons. 

Mr. van Rest spoke about small classifications being necessary 
in the description of timbers. This was often impracticable. The 
timber most commonly used, Baltic redwood or deal of the builder, 
was collected from the Baltic and North European countries, from 
a good many ports of origin. An architect could therefore hardly 
specify any particular port of origin for his timber—assuming that 
there were different properties between different ports—and still 
less could an ordinary builder be expected to do so. The ports 
collected timber from areas hundreds of square miles in extent, 
and from trees grown on quite different sites, with differences of 



208 Discussion [No. 2 , 

climate, soil and other conditions, so that timber from any one 
port might have a large amount of variation. 

Me. van Rest, in reply, said: Thank you for your patient 
reception of my paper. I feel that the opportunities of discussing 
problems which this Section offers are of great value to people like 
myself. 

Several points needing reply have been raised. Mr. Bartlett has 
described the calculation of an alternative Poisson distribution to fit 
the data on vessel groups and egg groups. I had considered whether 
the zero class should be considered as inconceivable or merely un¬ 
observable, and was led to the first conclusion chiefly because I could 
in neither case make a reasonable picture of such a class. For 
instance, in the case of groups of eggs in vessels it might be said that 
the zero class consisted of those vessels containing no eggs. But the 
actual number of empty vessels in the wood examined for the 
recorded observations was much greater than the calculated require¬ 
ment of 95 empty to 279 not empty. This, of course, does not prove 
that the zero class does not exist, and I agree that the possibility 
should be borne in mind. I have calculated (a) three Poisson series 
using the transformed variate (n — 1 ) and having respectively the 
observed means of the data quoted, and (b) three calculated by the 
formula given by Mr. Bartlett, and having the same means of 
observations as (a). None of the corresponding pairs of serieB 
differ significantly. This is to be expected, since if the probability 
of an event is small, and the observations therefore describable by a 
Poisson series, the probability of the event occurring more than once 
iB also small and equally describable by another Poisson series. 

Professor Pearson suggests that I have only touched on the 
problem of relating laboratory tests to practice. That is true; but 
I would emphasize that I am concerned at the moment not so much 
with the imitation of practical conditions, but the study of the more 
severe tests which are often undertaken to save time. It should be 
possible to study the problems involved in extra severity on much 
the same lines as the dosage-mortality curves of entomologists. 
A test which exactly simulates practice is not necessarily the most 
efficient test. For example, a teBt on egg boxes which damaged none 
of the eggs, or even one which damaged all the eggs, would not serve 
to discriminate between boxes of nearly equal protective qualities. 
Statistical reasoning will assist in determining the point at which 
the most efficient test can be made and will further help to relate 
the results of such a test (which can be made on a sample only) to the 
behaviour under practical conditions of the parent population of 
boxes. 

The point raised by Mr. Tippett about the outlying values not 
belonging to the theoretical distribution of the remainder (the argu¬ 
ment is not limited to the normal distribution) is a good one. But 
surely he is striking at the very foundations of statistical methods in 
denying that we can use experimentally determined statistics to 
make predictions about limits. It is true that in cases of vital import¬ 
ance we should use a safety factor over and above such limits (or use 



1937] 


on Mr . van Rest's Paper. 


209 


limits of a confidence level higher than usual), but the point is that 
part at least of the safety factor can be logically based on a statistical 
study of experience. 

I agree with Miss Pettifor that the timber from any one port 
might still have a large amount of variation in its properties, but my 
point is that it has less variation than timber of the same species from 
all parts of the world. It may not be practicable for an architect or 
builder to specify the port of origin for his timber, but the alternative 
is that having bought the timber he very often knows (from shipping 
marks) the port of origin and thus has information winch should en¬ 
able him to set closer limits to the properties in which he is interested 
than if he had not that information. 


i2 



210 


[No. 2, 


The Application of Hollerith Equipment to an 
Agricultural Investigation. 

By L. J. Comrie, PhJD., G. B. Hey, B.A., and H. G. Hudson, B.A. 

Introduction. 

The problem of sampling procedure is one that covers a wide range. 
In an agricultural investigation that two of us have conducted, it 
has led to some extensive research, and this has necessitated the use 
of Hollerith equipment to enable the arithmetical operations to be 
performed expeditiously. The agricultural details will not be 
discussed here—it is sufficient to say that the data consist of eight 
observations taken at different times on the wheat grown on each of 
7200 6* lengths of an experimental field. These observations have 
been grouped in various ways to give different sizes of sample, and 
the efficiency of each type of sample has been studied by an analysis 
of variance technique. To do this it was, of course, necessary to 
form many sums of squares. The observations were also used in an 
extensive correlation study, for which many sums of squares and 
products were calculated. The use of Hollerith equipment to 
produce the totals for different sizes of sample and to evaluate the 
sums of squares and products forms the subject of this paper. 

Statement of the Problem. 

The data consist of observed values A, B, C, D , jE, F, F f and G * 
from each of 7200 6' lengths of a wheat field. The 6' lengths are 
grouped into five “ strips ”, each of which is divided into ten rows of 
144 lines each. (See Pig. i.) 

It is desired to perform the following operations: 

(1) To form H = E — G. 

(2) To obtain the totals of A > B > C , JD } E, F, F% G and H for the 
whole experiment. 

(3) To add the corresponding observations A, B, 0 , D, F, G and 
H for 6* lengths to give totals for all rectangular areas of lengths 
6*, 12", 18", 24*5 36', 48*, 72" and 144* and widths 1 row, 2 rows, 
S rows and 10 rows. This gives 32 different sizes of plot, ranging 
from single 6* lengths to 144' by 10 rows. 

{4) To find lacy where x « A, B, C, D, F, G , H and y = A, B, 

* It has been considered best to retain the notation actually employed in 
the course of the experiment. 



1937 ] Hollerith Equipment to an Agricultural Investigation, 211 

C, D } F, G, H for eacli of the 32 different sizes of plot given above, the 
summation being over the whole experiment. 


Rows 

3 4 *56789 



Fig. i.—S ub-division of a Strip of the Experimental Field into Rows and Lines. 

(5) To obtain the frequency distributions of A, B, C 3 D, F> G and 
H for all sizes of plot. 

(6) To provide the means of taking random samples of plots of 
various sizes. 

Description of tre Equipment. 

Cards and Punches .—The basic unit in Hollerith equipment is a 
card 7§* x 3J*, divided into 80 vertical columns as shown in Fig. 2. 
Each column contains 12 vertical positions named, from top to 
bottom, Y, X, o, 1, ... 9. A group of several columns is called a 
field, and in each field a number (within the capacity of the field) 














212 


Cowrie, Hey and Hudson —The Applicatio.i of [No. 2, 



SM03 40 -OH 
HAOMSTL 


I •— «M 



«•» 1 ** n «0 i 


| «H±IVJ3VlOH*. 


-TTnlWifli Powl fr\r* T •onrr+licj rnnialun +11011 f\ M 



1937 ] Hollerith Equipment to an Agricultural Investigation. 213 


may be entered by punching a hole in each column. In the key 
punch the card is carried by a movable carriage, each key depression 
driving a knife through the card and releasing the carriage so that 
it moves to the next column. A first-class operator will punch about 
200 cards an hour; the number of errors made in punching is only a 
fraction of the number that would be made in copying. 

The Verifying Punch .—This is similar in appearance to the key 
punch, and is operated in the same way. Instead of being fitted with 
knives it has plungers, which pass through the holes already made 
in the card. If in any column the key depression does not correspond 
to the hole already punched the plunger cannot move, so that the 
carriage movement is arrested and the error detected. 

The Repoducmg Punch .—This machine transfers the punching 
in any columns of one series of cards to any columns (not neces¬ 
sarily corresponding columns) of another series of cards. Any 
information common to all the new cards can be punched at the 
same time; this is known as “ gang-punching The original cards 
pass underneath a series of 8o brushes, and, as soon as a hole comes 
opposite a brush, current passes. The currents from these brushes 
are led to a plugboard that can be altered at will, and thence to 
electrically operated punches over the columns of the new series of 
cards. The old and new cards are always at the same phase relative 
to brushes and punches respectively, so that the hole in the new card 
is punched through the same figure as the hole in the original card. 
The cards are then read or sensed by two more sets of brushes which 
are connected, again through a plugboard, to a series of relays. 
These relays close only when the holes have been punched in corre¬ 
sponding positions; the machine stops if all the relays brought into 
use do not close, thus detecting a mis-punched card. For gang- 
punehing the current from any of the brushes reading the new cards 
can be led back to the corresponding columns of the punch magnets, 
with the result that information punched on the first card of any 
series will be automatically punched on all subsequent cards. Henoe 
if the first and last cards of the series are correctly gang-punched all 
the other cards must be correct. The first of the new cards is hand- 
punched for this purpose and is called the leading card. 

The Sorter .—The function of the sorter is to separate the cards 
into groups according to the holes punched in any selected column. 
13 receptacles are provided for the sorted cards, one corresponding 
to each of the 12 positions 7 , X, o, 1, ... 9 in which a hole may be 
punched, and one for cards in which no holes are punched. The 
machine may be used to arrange cards in numerical order. In this 
case they are first sorted on the units column; the ten receiving 
pockets or boxes used are then emptied in order from o to 9, and the 



214 


Comrie, Hey and Hudson —The Application of [No. 2 , 


cards once more sorted, this time on the tens column. The process 
is repeated until the cards have been sorted on every column in which 
the digits concerned occur. The accuracy of each sort is easily 
verified by passing a needle through the appropriate holes in all the 
cards withdrawn from any one pocket, or by holding the cards up 
to a source of light. The sorter works at the rate of 24,000 cards 
per hour. 

The Tabulator .—The purpose of the tabulator is to add the items 
on the cards and print the totals. The process of printing each 
individual item as well as the totals is known as listing; if only the 
totals are printed, the process is tabulating. The cards are fed past 
a series of 80 brushes, to which direct current is supplied. Behind 
the cards are 80 insulated blocks, each connected to a plug socket in 
a plugboard. The adding mechanism consists of a bank of five 
counters, each with nine separate adding wheels. Each wheel is 
represented on the plugboard by a plug socket, so that any card 
column may be connected to any adding wheel by plugging across 
from one hole to another by means of a flexible cord, as on a telephone 
switchboard. The card acts as an insulator between the brushes and 
the blocks; wherever a hole has been punched, current passes through 
and actuates the adding wheel that has been plugged to the column 
containing the hole. 

The printing mechanism is on the right of the machine. The 
type bars are lifted in phase with the passage of the card past the 
reading brushes; the current that causes the adding also releases a 
pawl that arrests the type bar, holding it at its correct level until the 
type is struck by the type hammer. The printing of any individual 
column may be suppressed by locking its hammer; this is known as 
hammer-blocking. When totals are printed any counter may be 
cleared, or zeroised, or it may print only a ‘‘ progressive total 55 or 
“ sub-total”, i.e. it retains its contents intact. Besides the five 
print banks that are connected to the corresponding counters, there 
are, on the left, two list banks, with ten type bars each, which will 
print the numbers in any columns plugged to them from the cards. 
When the machine is set to tabulate, the first card of each group 
passes through as if the machine were listing, thus giving an oppor¬ 
tunity once a group for the printing of designating, identifying or 
indicative information, which is usually common to all the cards 
in a group. 

The speed of the machine is 6000 cards per hour when listing and 
9000 cards per hour when tabulating. There is a delay of one second 
each time a total is printed, so that the effective speed depends on the 
average number of cards per group. There are, of course, operating 
stops for various reasons such as placing cards in the feeding 



1937 ] Hollerith Equipment to an Agricultural Investigation. 215 

mecha n i s m, or new paper in the printing mechanism, so that the 
above speeds are not attained in practice. 

The point at which totals are printed can be controlled auto¬ 
matically, and this feature is of the utmost importance in the present 
application. As long as the numbers punched in the so-called control 
field remain constant, the machine goes on adding; as soon as a 
change occurs the machine stops, prints the totals (with or without 
clearing the counters), and reco mm ences feeding. This sequence is 



performed automatically, without any attention on the part of the 
operator. Actually the cards are read by two rows of 80 brushes one 
card apart, the lower brushes being responsible for actuating the 
adding wheels as already described. The control is plugged so 
that an effort is made to pass current through corresponding holes 
at the upper and lower brushes, and then through relays. In Fig. 3 
the machine is controlling on a 3-column field, both the cards being 
punched 426. If by the time the o J s have passed the brushes the 
circuit PQ is closed, the machine will continue to feed cards; if not 
it will stop feeding, print the total, clear if required, and then continue 
feeding. In this case, when the line of 6’s reaches the brushes the 






216 


Comrie, Hey and Hudson— The Application of [No. 2 , 


relay U is closed, when the 4*s pass H is closed, and when the s’s 
pass T is closed, so that the upper card would be passed on to the 
lower brushes for addition. If the following card were punched 427 
the relay U would not be closed, so that the end of the group of 
numbers 426 would be detected. 

The Multiplying Punch .—This machine will sense two numbers 
in any two fields up to eight columns each, multiply them, and punch 
the product or part of it (rounded off as desired) in any unused 
columns of the same card. It will also add the products in a summary 
counter with ten adding wheels. It will multiply by 4-figure multi¬ 
pliers at the rate of 1200 cards per hour, independently of the size of 
the multiplicand, or by 8-figure multipliers at the rate of 700 cards 
per hour. Further, if we have numbers A , B and C in different fields 
of the same card, it will form expressions of the type A ± B ± 0 or 
A ± BC , and punch the result in any columns of the same card. 
14 particular it was used to form H = E — G. 

Summary Multiplication .—The multiplying punch will produce 
in its summary counter the sum of the products of a series of pairs 
of numbers each from the same card. However, for groups of more 
than about 50 cards, this work can be done at much greater speed 
on the tabulator. Suppose that values of A, B, C, JD and E are 
punched on each of a series of cards, and that we require 'LA 2 , 
LAB , LAC, LAD and LAE. The cards are first sorted on the 
units column of A. The tabulator is then plugged so that the five 
fields are connected to the five counters of the machine. The cards 
are run through, 9*8 first, followed by 8 5 s and so on, with a control 
on the units column of A , the counters being cleared at each change 
of control. We now have the totals corresponding to groups of 
cards with 9, 8, 7, ... 1, o in the units column of A ; call these 
&9> • • * ail a 0 ; 6 g , 6 g , . . . & 13 6 0 » • • * &&> * • • e i> e o* 

Repeat this with the tens column of A , giving totals a' Q , a' 8> etc. 
If A does not exceed 99, it is evident that 

LA = 9% 4 * 8% 4 " • - • Hb 4 " £fi 4 9°a® 4 ~ 8oug 4 - * « • 4 " ioc&J 
= 4 - (a* 4 - u 8 ) 4 - - - * 4 - (ue 4 - (% 4 - - • . 4 - &i) 4 - 

4 - 4 - < 4 ) 4 - ... 4 - (&9 4 ~ <4 4 - ... + flj)} 

and 

LAB = 9 hg 4 " Sb 8 4“ ... 4 " 4 " 9^4 4~ &ob$ 4- • • • 4~ 

^ &9 4 - (&9 4 - b 9 ) + ... + (&9 4 - &8 4 - ... + 61) 4 “ 

io{&9 4 " (69 4 ” b's) 4 ” • - • 4 - (24 4 - 24 4 - ... 4 " &!)} 

It will be seen from the second form for that if the machine, 
instead of being made to clear the counters at the end of each group, 
prints progressive totals, all we have to do is to add up the first nine 



1937 ] Hollerith Equipment to an Agricultural Investigation. 217 

items of any tabulation to give the contribution to the total sum 
of products from the digits punched in the column on which we 
have been controlling. The tenth line serves as a check on the 
^tghulator by giving 'LA, LB, LG, LD and LE, which are very often 
known previously, and are the same for each column of A. 

If, after sorting, any digits are missing, a blank card may be 
placed in the pile at this point. By breaking the control it causes 
the progressive total to be printed; it is rejected during the next 
sorting. Alternatively the progressive total immediately preceding 
a gap in the digits must be added as many times as there are missing 
digits. 

Details- op Application. 

Preparation of the Cards .—The observations consist of the values 
of A, B, C, D, E, F, F f and G for each of the 7200 6" lengths. The 
first steps are to form H = E — G for each of these, to form the totals 
cf A, B, G, D, F, G and H for the various rectangular areas, and to 
punch the whole of this information on cards in readiness for the 
formation of sums of squares and products. All the information 
relating to each rectangular area is to be punched on one card. 
As the experiment was repeated a year later, each card was designed 
to accommodate the data for two years. The totals from the larger 
areas become so large that three card designs are necessary; that 
for lengths greater than 6 * is shown in Fig. 2. 

It is necessary to identify or designate each card according to the 
position in the experimental field that it represents, and to the size 
of plot to which it refers. The size of plot is shown in the first two 
columns, column 1 indicating the length, and column 2 the width, 
according to the following code :—■ 

Length, of area V idth of area 

in inches in ro^s 

Length Code V idth Code 
61 11 

12 2 22 

18 3 5 5 

24 4 10 o 

36 5 

48 6 

72 7 

144 8 

The position of each 6* length in the strip is indicated by two 
^ numbers, the row number (where 10 is coded o) in column 76, and the 
line number in columns 78-80. The position of each larger area is 
indicated by the numbers belonging to the 6' length that constitutes 
its top left-hand corner when looking at the field as shown in Fig, 1. 
These two numbers, together with the strip number (column 77) 



Z18 


uomriEj Hey and Hudson— The Application of [No. 2 , 


Table I. 


The Scheme for 


Columns* 5566666 6 

8901234 5 

2345678 2 

56- 2 

267 - 5 

53-8- 5 

264-9- 2 

56- 2 

2375-4“ 5 

56- 5 

264-6-2 

53-8- 2 

267- 5 

5 6- 5 

2345975 2 

56- 2 

267- 5 

53-8 - 5 

264—6- 2 

56- 2 

2375-4- 5 

5 6- 5 

264-9- 2 

53-8- 2 

267- 5 

5 6- 5 

2345678 2 

56- 2 

267- 5 

53-8- 5 

264-9- 2 

56- 2 

2375-4- 5 

5 6- 5 

264-6-2 

53-8- 2 

267- 5 

56- 5 

2345975 2 

56- 2 

2 6 7 -- 5 

53-8- 5 

264-6-2 

56- 2 

2 3 7 5 - 4 - 5 

56--- 5 

264-9-2 

53-8- 2 

267- 5 

56- 5 


the Super-Masters . 


6 

6 6 

6 

7 7 7 

777 

77 t 

6 

7 8 

9 

0 

1 

2 

345 

67 

2 

2 2 

2 

2 

2 

2 

001 

01 

2 

— 

- 

- 

- 

- 

002 

01 

2 

2 - 

- 

- 

- 

- 

003 

01 

5 

- 2 

- 

- 

- 

- 

004 

01 

5 

5 - 

2 

- 

- 

- 

005 

or 

5 

— 

- 

- 

- 

- 

006 

or 

2 

5 5 

- 

2 

- 

- 

007 

or 

2 

— 

- 

- 

- 

- 

008 

or 

2 

2 - 

5 

- 

2 

- 

009 

01 

5 

- 5 

- 

- 

- 

- 

010 

or 

5 

2 - 

- 

- 

- 

- 

on 

or 

5 

— 

- 

- 

- 

- 

012 

01 

2 

5 2 

5 

5 

- 

2 

013 

02 

2 

— 

- 

- 

- 

- 

014 

02 

2 

5 - 

- 

- 

- 

- 

015 

02 

5 

- 2 

- 

- 

- 

- 

016 

02 

5 

2 - 

2 

- 

2 

- 

017 

02 

5 

— 

- 

- 

- 

- 

018 

02 

2 

2 5 

- 

5 

- 

- 

019 

02 

2 

— 

- 

- 

- 

- 

020 

02 

2 

5 - 

2 

- 

- 

- 

021 

02 

5 

- 5 

- 

- 

- 

- 

022 

02 

5 

5 - 

- 

- 

- 

- 

023 

02 

5 

— 

- 

- 

- 

- 

024 

02 

2 

2 2 

5 

2 

5 

5 

025 

93 

2 

— 

- 

- 

- 

- 

026 

03 

2 

2 - 

- 

- 

- 

- 

027 

03 

5 

- 2 

- 

- 

- 

- 

028 

03 

5 

5 - 

5 

- 

- 

- 

029 

03 

5 

— 

- 

- 

- 

- 

030 

03 

2 

5 5 

- 

2 

- 

- 

031 

03 

2 

— 

- 

- 

- 

- 

032 

03 

2 

2 - 

2 

- 

5 

- 

033 

03 

5 

- 5 

- 

- 

- 

- 

034 

03 

5 

2 - 

- 

- 

- 

- 

035 

03 

5 

— 

- 

- 

- 

- 

036 

03 

2 

5 2 

2 

5 

- 

5 

037 

04 

2 


- 

- 

- 

- 

038 

04 

2 

5 - 

- 

- 

- 

- 

039 

04 

5 

- 2 

- 

- 

- 

- 

040 

04 

5 

2 - 

5 

- 

5 

- 

041 

04 

5 

— 

- 

- 

- 

- 

042 

04 

2 

2 5 

- 

5 

- 

- 

043 

04 

2 

— 

- 

- 

- 

- 

044 

04 

2 

5 - 

5 

- 

- 

- 

045 

04 

5 

- 5 

- 

- 

- 

- 

046 

04 

5 

5 - 

- 

- 

- 

- 

047 

04 

5 

— 

- 

- 

- 

- 

048 

04 


* For the method of indicating column numbers by signature cards, see 
page 223. Thus the above columns are 58 to 77 inclusive. 











































1937 ] Hollerith Equipment to an Agricultural Investigation . 219 

and other information to be described below, were punched by a 
process of automatic reproduction. 

The first stage in this reproduction is the making of 144 “ super¬ 
masters ”, the contents of the first 48 of which are shown in Table I. 
The remaining 96 super-masters form two repetitions of these, except 
that columns 73-75 run from 049 to 144, and columns 76 and 77 
from 05 to 12 in blocks of 12. Prom these super-masters 1440 
“ master cards ” were formed by reproducing the super-masters ten 
times—once with each of the following leading cards for gang 
punching. 

Column 

78 79 80 

1 1 2 

212 
322 
422 

5 3 2 

6 3 5 

7 4 5 

8 4 5 

9 5 5 

0 5 5 

Column 78 contains the row number, and columns 79 and 80 
the controls for adding horizontally in pairs and fives respectively 
(see below). The first 32 columns of the master cards were after¬ 
wards punched by hand with a selection of Tippett’s Random 
Numbers.* 

We are now in a position to punch the designation on the 6 * 
cards, which have not been used so far. Five leading cards are 
punched in columns 1 and 2 (shape of area) and column 77 (strip) 
as follows: 1 1 1, 1 1 2, 1 1 3, 1 1 4 and 1x5. The reproducer is 
then plugged as shown in Table II and the 7200 cards designated 
by running through the 1440 master cards with each of the fire 
leading cards in turn. This (with columns 1, 2 and 77) completely 
identifies the 6 * cards, which are then punched with the observations. 

Table II. 

Reproduction from Master Cards . 

Mioro- 

Controls plots Bow Erne 

Wire from master card ... 79 80 66 65 76 77 78 73 74 75 

Wire to 6' card . 70 71 72 73 74 75 76 78 79 

If the masters are sorted on column 58, it will be seen, by referring 
to Table I, that the cards in box 2 are those required for the desig- 

* L. H. C. Tippett. Random Sampling Numbers. Tracts for Computers, 
No. XV. (Cambridge University Press, 1927-) 



220 


Combie, Hey and Hudson— The, Application of [No. 2, 


nation of the 12* cards. With five leading cards punched 2 1 1, 
2i2, 213, 214 and 2 1 5 in columns i, 2 and 77, and the 
reproducer plugged as in Table II, except that colum n 67, not column 
65, is connected to column 73, five runs of the masters that came in 
box 2 yield the required 3600 12" cards, designated and ready to be 
punched with the 12" data when these have been prepared. The 
cards for the remaining 30 sizes of plot are designated in a similar 
manner; the masters (or a portion of them) are sorted on one of the 
columns between 59 and 64 and the appropriate boxes of cards 
taken; if the area covers more than one row, the masters are sorted 
on column 78 and only the desired rows retained. 

We have now reached the stage where all the cards—32,400 in 
number—are designated, and the observations punched on the 6 n 
cards by hand. We next form R = E — Q for each 6 * length by 
means of the multiplying punch, which punches the values of H as 
formed. We are then ready to form the data to punch on the 
remaining cards. To do this we utilise the figures punched in the 
control columns 70-73. Consider the formation of the 12" lengths. 
The fields A, B, C, D, F, G and H aie plugged on the tabulator to 
appropriate counters; as the numbers are small, counters 1 and 2 
take two fields each. We then run the 7200 cards through, control¬ 
ling on column 73, which is identical with column 65 of the super- 
masters (Table I). Since the cards are still in the order in which 
they came from the reproducer (namely strip x, row 1, lines 1,2, . . . 
144; strip 1, row 2, lines 1, 2, . . . 144; . . . strip 5, row 10, 
lines 1, 2, . . . 144) the machine will break control after every other 
card, thus automatically adding in pairs down the row. The 
machine lists or prints the designation of the first card of each group, 
thus giving 3600 12" totals, each with its proper identification. It 
only remains, therefore, to punch the totals from the tabulations 
just completed on to the 12" cards by hand to obtain a complete set 
of 12' data on cards. 

The 18" totals are formed from the 6" cards in a similar manner, 
controlling on column 72, which comes from column 66 on the 
super-masters, and which will cause the control to break after 
every third card; they are then punched on the 18* cards by 
hand. 

To obtain the 6* pairs, fives and tens, i.e. areas 6 " long and 2, 3 
and 10 rows wide, we first sort the 6 " cards strip by strip on columns 
80,79 and 78 in turn, which changes the order in each strip to line 1, 
rows 1, 2, . . . 10; line 2, rows 1, 2, . . . 10; . . . line 144, rows 
1, 2, ... 10. By controlling on column 70 (= column 79 on the 
masters) we get totals for the rows in pairs ,* by controlling on column 
71 (= column 80 on the masters) we get totals for the first five 



1937 ] Hollerith Equipment to an Agricultural Investigation . 221 


rows, and for the last five rows. By controlling on column So, Le. 
the unit figure of the line number, we get the totals for all the io 
rows in each line. All these totals can then be punched on the 
cards that have already been prepared to receive them. The 12" 
and 18" pairs, fives and tens are formed from the 6" pairs, fives and 
tens m the same way as the 12" and 18" lengths were formed from the 
6" lengths. All the remaining totals are formed by adding the 
previous cards in pairs by controlling on column 73, which may come 
from any column between 67 and 72 of the super-masters. A study 
of Table I will enable the method to be followed in detail; the 
complete scheme for the use of these controls is as follows:— 


Column of 

Original 

Length 

masters 

length 

produced 


// 

n 

65 

6 

12 

66 

6 

18 

67 

12 

24 

68 

18 

36 

69 

24 

48 

70 

36 

72 

71 

48 

144 

72 

72 

I44 


Dashes in Table I correspond to masters that have already been 
discarded by sorting on columns 58-64. It will be noted that the 
144" lengths have been produced by two distinct routes—6', 12", 
24”, 48", 144* and 6", 18", 36", 72*, 144"; the equality of the totals 
obtained affords a very valuable check on the punching and other 
work, as every intermediate stage is involved. 

Summary Multiplication .—We are now ready to proceed with 
the summing of squares and products; the theory of this has been 
described above. We first sort on the units column of A , and then 
connect fields A, B> 0 , D and F to counters 1, 2, 3, 4 and 5 respec¬ 
tively. We control on the column on which we have just sorted, and 
make the totals (for digits 9, 8, ... o) progressive. A super¬ 
imposed control, known as the major control, is applied to columns 
1 and 2; when this changes, all the counters are automatically 
cleared, ready for the next type of area. This enables the 32,400 
cards for all the areas to be put through in one continuous run, each 
type of area being sorted separately. We next sort and control on 
the tens column of A. It is possible, and usually worth while, to 
sort and control on two columns simultaneously if the range of 
numbers in these columns is not more than about 30. The whole 
of the cards are put through, although the zeros contribute nothing 
to the sums of the squares or products; the totals, however, act as a 
check. A further check is effected by connecting a card count to 



222 


Comrie, Hey and Hudson —The Application of [No. 2 , 


ike left-hand end of one of the counters; this counts the number of 
card passages in each group, and would thus reveal any missing or 
superfluous cards. 

We then connect fields B, C, D, F and G to counters 2, 3, 4, 5 
and 1 respectively, and sort and control on the various columns of 
B in turn. Continuing this process for the various fields, we produce 
all the required sums of squares and products. The final additions 
of the progressive totals, and the amalgamation of the contributions 
from the units, tens and hundreds digits, can be done on any avail¬ 
able adding machine. 

Practical Considerations .—Experience has shown that three things 
are necessary to complete a piece of work on Hollerith equipment 
successfully. Firstly the work must be planned in detail before ever 
a machine is approached; secondly checks must be provided at 
various stages of the work, and thirdly, minute attention must be 
paid to detail whenever working on the machine. The plan of work 
has already been outlined; it remains to comment on some of the 
details of technique. 

Punch operators are liable to make mistakes, and this means 
that they have to repunch cards, including the designating part 
previously punched by the reproducer; because of this it is advisable 
to have the first year data and the designation together. The punch 
has a stop that is set to prevent a card going further into the machine 
than the first column in use, reading from the left; to avoid altering 
this when repunching the designation, this designation is on the 
extreme right. Since the printing of special column headings and 
rulings between fields was dispensed with (on the grounds of 
economy), the reading of the designation is facilitated by having it 
at the end of the card. The difficulty of the first two columns in 
repunched cards is overcome by gang punching a batch of cards 
in these two columns only, as the information there remains constant 
over a considerable time of punching; this position also has the 
advantage of being easily read. The efficiency of punching is 
increased by double-spacing all tabulations from which cards have 
to be punched, and arranging for all spacing noughts to be printed, 
since the punch operator punches o to obtain a spacing. These are 
made to print on the tabulator by leading some figure (not nought) 
into the left-hand wheel of each counter and then hammer-blocking 
the corresponding type-bar. As the machine prints only significant 
ciphers this has the effect of printing all o’s although suppressing the 
figure that made the o’s significant The tabulator also prints the 
identification of the first card of each group. 

The two Hst banks, which record the designating information 
on the cards, were connected to card columns as follows:— 



1937 ] Hollerith Equipment to an Agricultural Investigation . 223 

Type-bar ... i 2 3 4 5 6 7 8 9 10 

List Bank I ... 1 H 2 H H C H H 74 75 

List Bank II ... 76 H H 77 H H 78 79 80 

H = hammer-blocked. C = the control column. 

The next problem is that of cheeking the punching. Experience 
leads us to suggest that the punching of the original data should 
be checked by the verifying punch or by listing on the tabulator 
followed by reading by experienced proof-readers. The data under 
discussion were listed and read by inexperienced proof-readers, so 
that several errors were not discovered till later. After the 6 " cards 
had been formed and listed, the totals of A, B, C, Z), F , 6 and H 
were known. The punching of the data for the larger areas was 
checked by listing the cards, and noting the total. If any total 
in this listing was correct, it was assumed that the punching was 
correct; if incorrect, the mistake was found from the listing by 
comparison with the sheet from which the data were punched. It 
was also necessary to see that the cards on this check-listing were all 
present and in the right order, which is shown by the listing of 
indicative matter. 

In the multiplications it is necessary that the work produced by 
the tabulator should be headed, so that it is possible at a later date 
to discover the card columns to which each counter wheel and list 
bank were connected, and on which the machine was controlling. 
This is done by preceding the run by two “ signature ” cards, the 
first being punched in each column with the tens figure of the number 
of that column, and the second with the units figure of the number 
of that column. Hence they automatically record for each column 
of print the card column to which it and the corresponding counter 
wheel are connected. Thus a connection to the five consecutive 
columns 38-42 would give the signature 

3 3444 
89012 

Care must be taken that these cards are removed when not required, 
and also that they are re-inserted whenever the plugboard is altered 
in any way. 

Since there is a control on the column on which the cards have 
been previously sorted, and a major control on columns 1 and 2, 
any card that is in a wrong group will break one of the controls, 
and thus reveal its presence. The group containing the stranger 
will be interrupted and totalled; a separate group will be made of the 
stranger and its total, and then the main group will be resumed. 
Since the identification of the first card of each group is printed on 
the list banks, it is possible to trace the stranger card and rectify the 



221 : Hollerith Equipment to an Agricultural Investigation. [No. 2 , 


error in the group to which it really belongs without re-tabulating. 
Checking by comparing the totals on line o with their known values 
was effected in every tabulation. The additional check provided by 
counting the number of cards in each group proved to be of value, 
especially in one case where two cards were punched with the separate 
data for the two halves of an area instead of being combined into 
one card—a mistake that was not found by the total check. 

Provision has been made in the ten columns of random numbers 
for the taking of random samples of any size from the experiment. 
These random numbers were punched on the cards by reproduction 
from the master cards when the identification was reproduced. In 
the formation of these master cards the whole of Tippett’s Random 
Numbers was punched; this has a value entirely independent of the 
present experiment, since these numbers may now be used in any 
Hollerith investigation by the simple process of reproducing the 
appropriate columns of the master cards. Before using these 
random numbers it is necessary to sort on at least three columns, to 
ensure that the cards are really in a random order before taking a 
batch out as a sample; this is analogous to shuffling a pack of playing 
cards before dealing. 

In conclusion we would remark that the method of summary 
multiplication here described is not put forward as new, although 
doubtless many workers who use analysis of variance and multiple 
correlations are not acquainted with it. It would not have been 
possible—from the points of view both of time and expense—for 
this investigation to have been completed without the aid of the 
equipment we have described. It is possible to pass from 57,600 
(= 8 x 7200) observations to the formation of nearly 200,000 new 
totals and the formation of the t otals of nearly a million products in 
the comparatively short space of two months. The portion of our 
investigation described would have taken twenty times as long if the 
work had been done by ordinary calculating machines. 



1937] 


225 


Significance Tests which may be Applied to Samples from any 
Populations. II. The Correlation Coefficient Test. 

By E. J. G. Pitman. 

1. This paper continues the discussion of significance tests which 
involve no assumptions about the forms of the populations sampled. 
Here we are concerned with tests for dependence, but the develop¬ 
ment follows very closely that of the previous paper (1).* 

2. Significant, Nonsignificant, and Doubtful Pairings and 
Samples. 

Suppose that we have two sets of n numbers, 

X X* • • * x n> 2 / 1 ? 1)2? * • * 

with means x and y respectively. The numbers of one set may be 
paired with the numbers of the other set in n f ways. For any 
pairing such as 

(\, y,), (•*„,, y t )> • ■ ■ (* w yj> 

we may calculate a correlation coefficient r defined by 

- *)(&« -y) _ - »*9 

r “ vwxp m - *) 2 • - m vp(*p m - ■ z(y* m - m' 

Let I be a fixed integer less than w! Consider any particular 
pairing R. If there are not more than M pairings with a correlation 
coefficient equal to or greater than that of R in absolute value, we 
shall call R significant . If there are M or more pairings with a 
correlation coefficient greater in absolute value than that of R, we 
shall call R nonsignificant. A pairing which is neither significant 
nor non-significant will be called doubtful. Since for all pairings the 
denominator of the expression for r is constant, the significant 
pairings are those which give the greatest values of |'Zxy — nxy\. 
Suppose, now, that X and Y are chance variables and that 

(*n ud> y*)» • * • y») 

is a sample of n pairs of simultaneously observed values of X and Y. 
We shall call the sample significant, non-significant, or doubtful, 
according as the pairing 

( x i> Ui)> • • • ( X n, Vn) 

* There is an error in section 6 of that paper. In lines 5 and 6 of p. 129 , 
instead of “the distribution of to ” lead “the approximate distribution of 
wj (1 — w)P 



226 


Pitman— Significance Tests which may be 


[No. 2 , 


of the two sets of numbers 


2 * 1 , % • • • 
y* y» * * • Vn 

is significant, non-significant, or doubtful. Put P = Mjn\. If 
X and Y are independent, the probability of a sample depends only 
on the x values and the y values, and is the same for all possible 
pairings of these sets of values. Hence the probability of obtaining a 
significant sample is not greater than P, and the probability of 
obtaining a non-significant sample is not greater than 1 — P. These 
would be exactly P and 1 —■ P respectively if the probability of 
obtaining a doubtful sample were zero, as it would be if we were 
actually sampling from continuous populations. 

If on obtaining a significant sample we always decide that X and 
Y are not independent, the probability of error when X and Y are 
actually independent is not greater than P. In practice we choose 
M so that P is equal to, or approximately equal to, our usual working 
value, such as 0*05 or 0*01, for the permissible probability of error of 
such statements. 

Example. The following pairs of values of the variables X and Y 
are observed:— 

X 1*1 1*2 1*3 1*5 1*9 

Y 1*7 1*6 1-9 1*3 1*0. 

Is there any evidence of association? Since r is independent of 
scales and origins we may take the origins at the smallest observed 
values of X and Y , and then drop the decimal points. We have 



0 

1 

2 

4 

8 

x = 3, 


Y 

7 

6 

9 

3 

0 

y = 5, 

ruy = 75. 

Tie pairings which give the largest 

JO l 2 4 

values of \Lxy 

Yxy 

8 

— 751 are 

\Yxy - 751 

Y 

9 

7 

6 

3 

0 

31 

44 


9 

6 

7 

3 

0 

32 

43 


7 

9 

6 

3 

0 

33 

42 


6 

9 

7 

3 

0 

35 

40 


0 

3 

6 

7 

9 

115 

40 


7 

6 

9 

3 

0 

36 

39 


9 

7 

3 

6 

0 

37 

38 


6 

7 

9 

3 

0 

37 

38 


0 

3 

7 

6 

9 

113 

38, 


If we take P = 0 * 05 , M — 51/20 = 6. The pairing determined by 
the observations gives the sixth largest value of \r\, and is therefore 
significant. The sample is thus significant, and we conclude that 
X and Y are not independent. 



1937 ] 


Applied to Samples from any Populations. 


227 


3 . The Approximate Distribution of r. 

In this test we are concerned with the distribution of r deter¬ 
mined by the n ! pairings of two given sets of numbers, 

*®1j *®2» * • * 

Vv Hz » * * 4 Vn> 

when all such pairings are equally probable. For convenience we 
choose origins so that 

® = 0 = y\ 

we then have 

S xy 

r ~ 

As might be expected, the expressions for the moments of the 
r-distribution are simplest in terms of Fisher’s ^-statistics,* which 
are, for the y values, 


1 


Jr — x y w 2 p — ___ 

%-l y 3 3 “ (n - 1 )(n - 2) 


*y\ 


*« “ (» — 1)(„ — 2 )(„ — 3 ) «» + W - **■» - WW- 
The corresponding statistics of the x values will be denoted by 
A 2 , Aj, 

For the moments of r we have 

m = o, 

i 


E{r 2 ) ■ 
E(r z ) = 
E(r*) = 


n — 1’ 

« — 2 h s k 3 


n(n~ l) 2 h 2 m & a 8/2 ’ 

3 {n — 2 )(n — 3 ) \ kj 


(n — l)(n + 1) ‘ n(n + l)(/i — 1 ) 3 A 2 2 ft 2 2 ‘ 

Thus, when 

A. 

are not too large, and n is sufficiently large, we have 

- 1 <£ r £ 1 , 

1 


&3 ■, ^ 


E(r) = 0 , £(r 2 ): 


' m - r 


and, approximately, 


£(r 3 )=0> £(r4) = 

* This remark applies also to the corresponding expressions in the previous 
paper (1). The expressions for the moments of r which are given here may be 
obtained directly without much labour. 



228 


Pitman — Significance, Tests which mag he [No. 2, 


Now, these are the first four moments of the continuous distribution 
from — 1 to + 1 with frequency function 


(1 - z 2 )*/ 2 ~ 2 


£(!,!«- I} v * ~ / . (A) 

The square of a variable distributed in this way has a P(|, \n — 1) 
distribution, which may therefore be taken as the approximate 
distribution of r 2 , i.e. the probability that r 2 c is approximately 


Assuming that the sample values are such that the B distribution is 
a good approximation to the distribution of r 2 , we determine c so that 

The significant pairings of the sets 


%13 

Vv Vn 

will be those which give values of r 2 greater than or equal to c. 
Thus the sample 

(*1> Vl), ( X 2> ili), • ■ ■ ( X m y„) 


will be significant if 

r 2 _ jg(g - &)(y - 

2 (x-x)*.S (y-y)* 

has a value greater than or equal to c. 

The table below shows the true and the approximate distribution 
of r 2 for the sets considered above, 

X 0 1 2 4 8 

TO 3 6 7 9. 

P is the true probability (correct to the third decimal place) of 


r 2 

P 

P' 

r 2 

P 

P' 

0*0500 

0*725 

0*718 

0*3645 

0*300 

0*281 

0*0605 

0*691 

0*690 

0*3920 

0*275 

0*259 

0*0720 

0*667 

0*662 

0*4205 

0*233 

0*237 

0*0845 

0*642 

0*635 

0*4500 

0*208 

0*215 

0*0980 

0*633 

0*608 

0*4805 

0*192 

0*194 

0*1125 

0*592 

0*581 

0*5120 

0*167 

0*174 

0*1280 

0*575 

0*554 

0*5780 

0*133 

0*136 

0*1445 

0*542 

0*528 

0*6125 

0*117 

0*118 

0*1620 

0*517 

0*502 

0*6480 

0*100 

0*100 

0*1805 

0*508 

0*476 

0*6845 

0*092 

0*084 

0*2000 

0*492 

0*450 

0*7220 

0*075 

0*068 

0*2205 

0*458 

0*425 

0*7605 

0*050 

0*054 

0*2420 

0*450 

0*400 

0*8000* 

0*042 

0*041 

0*2645 

0*425 

0*376 

0*8820 

0*025 

0*018 

0*2880 

0*375 

0*351 

0*9245 

0*017 

0*009 

0*3380 

0*358 

0*304 

0*9680 

0*008 

0*002 



1937] 


Applied to Samples from any Populations . 


229 


obtaining by chance pairing a value of r 2 as great as or greater than 
that shown. P' is the probability calculated from the B distribution. 
All possible values of r 2 are listed except those below 0 * 05 . 
Although n is only 5 the agreement is reasonably good. In calcu¬ 
lating the above values of P' the range of integration is from r 2 to 1 . 
It will be noticed that P' is generally less than P, and that the agree¬ 
ment would be improved if a continuity correction were applied by 
taking the lower limit of the integration at the point mid-way 
between r 2 and the next lower value of r 2 . This has not been 
done because in practice it would usually be troublesome to determine 
the next lower value of r 2 . 

If z has the distribution (A) above, J (1 + z) has a B(\n — 1, 
\n — 1) distribution; hence the approximate distribution of 
|(1 + r) is of this type. It is sometimes more convenient to deal 
with v instead of r 2 , as in correlation of ranks. See § 5 . 


4 . Comparison with the Test Based on Normal Distribution . 

If X and I r are independent chance variables, each with a con¬ 
tinuous distribution and one of them with a normal distribution, and 
if r is the correlation coefficient of a sample of n pairs of values of 
X and Y, then r/v / (l — r 2 ) is distributed like “ Student’s ” z for a 
sample of n — 1 , or, what is more convenient here, r is continuously 
distributed from — 1 to 1 with frequency function 


1 

1 ) 


(1 - x 2 ) n l 2 -\ 


while r 2 has a B {\ 9 \n — 1) distribution. 

On this result the usual test for significance of an observed 
correlation coefficient is based. This test makes use of the same 
B distribution as the approximate form of the test described above, 
and so both tests will give the same answer; but the spirit of the two 
tests is quite different. Both test the null hypothesis that X and Y 
are independent. One makes definite assumptions about the 
bivariate population sampled and proceeds by comparing the sample 
obtained with all possible samples; the other makes no assumptions 
about the population and compares the sample with all the other 
samples with the same X and Y values but different pairings. 


5 . Correlation of Ranks, 

When the values of the variables X and Y cannot be properly 
measured, but can only be arranged in order of magnitude, the 
result of our n observations consists of two sets of numbers, 


Vv Vs? • • * Vm 



230 


Pitman — Significance Tests which may be [No. 2 , 


each set being a permutation of the integers from 1 to n ; x m y m 
denote the ranks of observation m as regards X and Y respectively. 
If the variables X and Y are really independent, the pairing of the 
ranks is due to chance and any pairing is as likely as any other. 
Hence we can calculate a ranks correlation coefficient r and test for 
significance as before. 

The true distribution of the coefficient of correlation of ranks, 
when all pairings of the ranks are equally probable, agrees well with 
the approximate distribution, even when n is as small as 6. The 
following table gives the true probability P and the approximate 
probability P' of obtaining a value of \r\ as great as or greater than 
that shown, for n = 6 . The value of P' was obtained by using the 
approximate distribution of |(1 + r), in this case a B( 2, 2) distribu¬ 
tion. As the difference between adjacent values of r is always 

12 

»(w* - 1)’ 

a continuity correction is easy to apply, and this has been done in 
calculating P'. 


Iri 

P 

P' 

M 

P 

P' 

0*0286 

1-000 

1-000 

0-5429 

0-297 

0-297 

0-0857 

0-919 

0-914 

0-6000 

0*242 

0-236 

0*1429 

0*803 

0-829 

0-6571 

0*175 

0-181 

0-2000 

0-714 

0-745 

0*7143 

0136 

0*133 

0-2571 

0-658 

0-663 

0*7714 

0-103 

0-091 

0*3143 

0-564 

0-583 

0*8286 

0-058 

0*056 

0-3714 

0*497 

0*506 

0*8857 

0*033 

0-029 

0-4286 

0-419 

0-432 

0-9428 

0-017 

0*011 

0-4857 

0-356 

0-362 

1-0000 

0*003 

o-ooi 


6. Comparison with the Analysis of Variance . 

Suppose that 

x l5 x 2 , . « . x n 

2/l> • * • Vn 

are the yields from the plots in two blocks A and B of an agricultural 
experiment in which each block is subdivided into n equal plots and 
n different treatments are applied. The numbers x m , y m denote the 
yields from the plots in A and B respectively which have been 
subjected to treatment ni. If the treatments have a real effect we 
shall expect positive correlation between the x and y values. If 
they have no effect, any correlation which exists will be due to the 
chance pairing. 

Since we hope to detect positive correlation, our definition of 
significant pairing will be altered by replacing “ greater in absolute 
value ” and Qi equal in absolute value ” by ee greater ” and “ equal.” 
The significant pairings are now those which give the greatest values 



1937] 


Applied to Samples from any Populations . 


231 


of r , not \r\. Assuming that we can use the approximate distri¬ 
bution, we determine c so that 

, /o 

2 (1,it-1)(• (I _ a0 "’“ ! * = P; 

then if 

S(rr — £)(v — y) 

T ~~ V{Z( X — W • S(y — £) 2 } 

is greater than or equal to c, we may regard the result of the experiment 
as significant and can conclude that the treatments have had a real 
effect. 

No assumption has been made about the distribution of fertility 
within the blocks; we have not even assumed that the two blocks 
are of the same size. The only assumption is that the allocation of 
the treatments to the plots within a block is arrived at by chance. 
Note that the two sets of numbers are not a sample from a larger 
population; they constitute the whole population. Our test for 
significance is based on the distribution of r deter min ed by chance 
pairings. 

Denote the mean of all the x and y values by z, i,e. 

z = |(2 + y). 

By the usual analysis of variance we have 

S(z m - zf + - z ) 2 = B x + S 2 + S 3 > 

where 

S ± = n(x — z ) 2 + n(y — 2 ) 2 , sum of squares due to blocks; 

S 2 = 2 T,{l(x m + y m ) — z} 2 , sum of squares due to treatments; 

Sg = 2{x m — U x m + y m ) — * + z} 2 + 2{*/m - l(x m + y m ) — y + z} 2 , 

residual sum of squares. 

Also we have 

S 2 + S z = X(x m - x ) 2 + X(y m - y)\ 

The usual theory assumes that 

= Q + Vm = & + 

where a, 6, T m are constants, the “ effects ” of block A, block B, and 
treatment m respectively, and the 5 and tj are independent chance 
variables (experimental errors) each with the same normal distribu¬ 
tion of standard deviation a. On these assumptions, S Z J cr 2 is dis¬ 
tributed like x 2 with n — l degrees of freedom. If the treatments 
have no difference in effect, i.e. 

T\ == T 2 ==.... = T n , 

then S 2 ja 2 is also distributed like x 2 with n — 1 degrees of freedom. 
The distribution of i log* (S 2 /S z ) is known to be of Fisher's z form. 



232 


Pitman- —Significance Tests Applied to Samples . [No. 2, 


Here it is more convenient to use the fact that 

S 2 _ iS(z M — x+y m — yf 

>% + >% s(®--J)*+s (y m -P ) 2 

has a — 1), 1(« — 1)} distribution. It is then easy to show that 
o #2 i - x)(y m — y) 


“ S* + S, 1 I(S(x m - + S(y m - y)*} 

is distributed from — 1 to 1 with frequency function 


1 

Til 1 ,, ~ 1 \ 

B ^r ~2-) 


(i -**)■ 


71 — S 


(B) 


(C) 


If the different treatments really produce different effects, this will 
tend to make S 2 large, while S 3 is not affected. Hence large values 
of (B) are regarded as significant. 

Thus the analysis of variance uses the expression (B), which on 
the null hypothesis (treatments without different effects) and on the 
assumptions stated above, is distributed according to (C). The test 
developed in this paper uses the expression 

%){ym §) / t \\ 

(y m ~9)*}; ■ • ■ ^ 

which, on the null hypothesis, is approximately distributed according 
to 


1 




n —4 


- • (A) 


This suggests that perhaps some modification of the analysis of 
variance procedure may be necessary if it is to be freed from its 
present assumptions; but further discussion must be reserved for 
another paper. 

In conclusion, I wish to thank Mr. E. Williams for his assistance 
with the computations, and Dr. J. Wishart for his great kindness in 
seeing this and previous papers through the press. 


Summary . 

A test of dependence is proposed which is based on the correlation 
coefficient of a sample but which makes no assumptions about the 
population sampled. It is shown that an approximate form of the 
test will give the same result as the usual test based on normality. 
Moreover the validity of the approximation is determined by the 
sample values alone, without any reference to the (probably 
unknown) characteristics of the population sampled. 


Reference. 

(I) Pitman, E. J. G. “ Significance Tests which may he Applied to Samples 
from any Populations,” J . Roy, Mat. Soc. Suppl. 1937 :4 :119-130. 



1937] 


233 


A Catalogue of Uniformity Trial Data. 

By W. G. Cochran, Rothamsted Experimental Station. 

Some Uses of Uniformity Trial Bata . 

In a field uniformity trial, the area under experiment is divided into 
a number of plots, usually all of the same dimensions; the same 
variety of the crop is grown and the same manurial and cultural 
operations are carried out on each plot. The yield of each plot is 
recorded separately at harvest. In some cases other observations 
are made as well as yield, e.g. stand of plants. In the case of tree 
crops the plot may consist of a single tree or a group of trees. 

The usefulness of a uniformity trial lies in the fact that neighbour¬ 
ing units may be amalgamated to form larger plots of various sizes 
and shapes. The variation in yield over the field due to soil hetero¬ 
geneity, slight differences in the distribution of manures, errors in 
weighing, etc. (generally summed up in the term “ experimental 
error ”), may be calculated for each type of plot formed. The most 
obvious use of the data is to provide information on the optimum 
size and shape of plot, and this is the manner in which the majority 
of the trials given below have been used. In such studies, once the 
optimum size and shape have been determined, the standard error 
per plot and the number of replications required to reach a given 
degree of accuracy in the comparison of the mean treatment yields 
are also of interest. This type of information is not, of course, 
peculiar to uniformity trial data, but is supphed by every properly 
designed replicated experiment for the particular type of plot used. 

A comprehensive study of the uniformity trial data on size and 
shape of plot has been made by Fairfield Smith, 1 who derives from 
them an empirical relation of wide applicability between variance 
per plot and size of plot. 

Uniformity trials can also be used to compare the relative 
efficiencies of different types of experimental design, and, in particular, 
to test whether any newly proposed design seems suitable for a 
certain crop. For example, Yates 2 tested the efficiency of a new 
method of arranging variety trials on Parker and Batchelor’s uni¬ 
formity data with oranges (catalogue 64). Unfortunately only a 
small proportion of the trials given below are suitable for comparisons 
of this kind. For if a trial is intended to provide information on the 
SUPP. VOL. iv. no. 2. K 



234 Cochran —A Catalogue of Uniformity Trial Data. [No. 2, 

optimum size and shape of plot, as most of the trials are, the smallest 
unit harvested requires to be somewhat smaller than the size of plot 
likely to be used in practice, so that various shapes of plot may be 
obtained by amalgamation. In consequence, many trials contain 
only a few plots of the size which is finally recommended. 

The further question whether differences in soil heterogeneity 
from plot to plot in a field persist year after year is obviously of 
practical importance. Several trials have been continued on the 
same site for a number of years, some with the same crop, e.g . the 
trials on Ragi discussed by Lehmann (catalogue 77) and some with 
varying crops, e.g . the Huntley uniform cropping experiment 
(catalogue 1). As a rule, the yields of the same plot in successive 
years have been found to be positively correlated, whether the same 
crop followed or a different crop, but the closeness of the correlation 
has varied considerably. 

The next step was to consider whether these correlations might 
be used to improve the accuracy of field experiments. With a high 
correlation it might clearly be worth while to run a uniformity trial 
as a preliminary to a field trial. The question of how to adjust the 
yields of the final experiment for differences shown in the uniformity 
trial at first caused some difficulty. The introduction of the statistical 
method known as the analysis of covariance, however, provided a 
means of correction free from any element of arbitrariness, and gave 
a stimulus to studies on the value of a uniformity trial as a pre¬ 
liminary to field experimentation. The results of these investigations 
are now well known. With annual agricultural crops, uniformity 
trials have not in general doubled the precision of subsequent field 
trials, whereas they entail approximately double the labour of a field 
trial with no previous uniformity trial, and a year’s delay in the 
experimental results. With perennial plants, such as rubber, for 
example, where each plot consists of the same trees or bushes year 
after year, the gain in precision is decidedly higher, and preliminary 
records may often be obtained without much extra labour, or may 
indeed be part of a standard observational programme. The case 
for a preliminary uniformity trial is then considerably stronger. In 
animal nutrition work, also, the experimental unit is the same in the 
uniformity trial and the actual experiment, and the covariance 
method has proved strikingly successful in some such cases (cf. 
Bartlett). 3 

Uniformity trial data have also occasionally been used as a check 
on the applicability to field experiments of the analysis of variance 
and the tests of significance based on it. The mathematical theory 
from which the z table is derived requires the assumptions that the 
experimental yields are normally distributed and that their deviations 



1937] CochhaK' —A Catalogue of Uniformity Trial Data, 235 


from the means about which they vary are uncorrelated. These 
assumptions are known to be untrue for field trials. A preliminary 
requirement for the application of the analysis of variance to be 
possible is that the experimental design used should be chosen at 
random from a set of designs such that, in the absence of any treat¬ 
ment effect, the average treatment mean square over the set should 
equal the average error mean square. The repeated use of the same 
design, however excellent in itself, is condemned on these grounds, 
and Tedin 4 has estimated the bias in the Knut Yik square from a 
set of uniformity trial data. The further question arises : how good 
an approximation to the tabulated z distribution is generated by the 
process of randomization used ? There again the question may be 
tested from uniformity trial data. One such example has been 
worked out by Eden and Yates, 5 and further examples would be of 
considerable interest. 

The large number of uniformity trials which have been carried 
out and the applications mentioned above testify that uniformity 
trial data play an important part in modern research on field technique. 
A catalogue of the uniformity trial data at present available there¬ 
fore appears likely to be of value, in order to facilitate further research 
in field technique, and also to bring to light unpublished material 
which might otherwise be lost. With this end in view, we have at 
Rothamsted during the last few years been constructing a card index 
of such trials, and we have also encouraged workers with whom we 
have come into contact and who have conducted uniformity trials, 
but who have for various reasons been unable to examine the results, 
or who have examined them but have not published their conclusions, 
to file a copy of their material at Rothamsted. The following 
workers have furnished us with material of this nature : 


No. and size of plots, etc. 


G. H. Goulden... 

Barley 

2304 plots 3' X 3' 

} acre 

F. J. Pound 

Cacao 

Several thousand trees since 

S.M. Gilbert ... 

Coffee 

1914 

12,000 tree yields of cherry 
490 plots 1 row X 21' 

— 

H. G. Ducker ... 

Cotton 

j acre 

0. V. S. Heath... 

Cotton 

3696 individual plants 

— 

D. MacDonald... 

Cotton 

1152 plots 3£' X 30' or 40' 

3J acres 

A. H. McKinstry 

Cotton 

480 plots 1 row X 23' 

1 acre 

A. R. Saunders 

Maize 

i 250 plots 1 row x 10 plants 

— 

Huntley (Mon.) 

Oats 

■ 46 plots 23£' X 317' 

8 acres 

J. Grantham* ... 

Rubber 

1000 trees for 10 years 

— 

H. Evans 

Sugar 

Cane 

710 plots 5' X 50' 

6 acres 

H. F. Smith 

Wheat 

1080 plots 6" X 1' 

acre 


Catalogue 

Ko. 




* Grantham’s data on rubber have already been utilized by Murray in 
the paper referred to in the catalogue. 








236 Cochran —A Catalogue of Umfonmty Trial Data . [No. 2, 


In most cases in which uniformity trials have been used as the 
basis of published work, but in which the original data have not been 
published, we have written to the authors concerned suggesting that 
they might like to file copies of these data at Rothamsted, so as 
to make it accessible to other workers. This suggestion has been 
met for nineteen trials. 

Finally, in seven cases the data, although not published, are 
known to have been filed elsewhere. 

The entries in the catalogue can therefore be classified as follows: 


Material. 

No of entries in 
catalogue. 

No. of trials. 

Average no. of plots 
per trial. 


Held 

crops. 

Trees. 

Field 

crops. 

Trees. 

Field 

crops. 

Trees. 

Already published 

Not published . 

Not published but copy 

73 

13 

135 

25 

221 

223 

14 

1 

22 

1 

554 

500 

/ at Rothamsted 

21 

4 

28 

4 

539 

3,761 

\ elsewhere . 

5 

1 

6 

1 

1,440 * 

50 


* One entry contains 203 sugar-cane trials each of 36 plots. This entry has 
been omitted when finding the average number of plots per trial. 


As is to be expected from considerations of space, the average 
number of plots per trial is considerably greater for trials the yields 
of which have not been published than for those which have. This 
makes the recovery of such data the more valuable. 

It must not be assumed that in the 14 entries given as “ not 
published 35 the data are inaccessible. In some cases we have not 
been able to get into touch with the author, perhaps owing to change 
of address, and in others replies have not yet been received, but 
we hope in time to reduce this, and students are meanwhile advised 
to write to the author concerned about such trials. 

This catalogue will not have been in vain if it has rescued from 
oblivion the 32 uniformity trials now filed at Rothamsted. It will 
be more valuable if, as we hope, it encourages other workers at present 
unknown to us, who have carried out uniformity trials, to furnish 
particulars of these, and if possible to make available copies of the 
original data. We should also be grateful for any information on 
omissions from this list. Although it is, we hope, fairly comprehensive 
as regards English (including Empire) and American Journals, we 
make no pretence to have searched the Continental literature at all 
thoroughly. This task we commend to some other worker. 

In conclusion, we must thank the workers who helped in the 
compilation by sending data or information. 





1937] Coon ran —A Catalogue of Uniformity Trial Data . 237 


Reference, s. 

1 H. Fairfield Smith (in the press), J. Agric. Sii. t 1937. 

2 F. Yates, ibid., 1936, Vol. 26, pp. 424-55. 

3 M. S. Bartlett, ibid., 1935, Vol. 25, pp. 238-44. 

4 0. Tedm, ibid.. 1931, Vol. 21, pp. 191-208. 

0 T. Eden and F. Yates, ibid., 1933, Vol. 23, pp. 6-17. 


The Catalogue. 

The entries are arranged alphabetically under crops, and for 
each crop alphabetically under author’s names. The information 
given is as follows : the size and shape of the smallest unit harvested, 
its approximate area as a fraction of an acre, and the approximate 
total area (T.A.) occupied by the trial. In some cases complete 
information on these points was not available. 

The following symbols have been used to show where the data 
may be found :— 


Published in the paper ... 

Not published . 

Not published, but filed : 
at Eothamsted 
Elsewhere . 


G 

N 

B 

E 


Notes have occasionally been added to the entries in cases where 
several measurements were made on the crop. 

Alfalfa. 1 . 46 plots, each 23£' x 317' = £ a. T.A. 8 a. G. 

(1) 1912-14. Harris, J. A., and Scofield, 

C. S. Permanence of differences in the plats of 
an experimental field. J. Agric. Res., 20 , 
335-56. 

(2) 1922-23-24. 

Further studies on the permanence of differ¬ 
ences in the plats of an experimental field. 

J. Agric. Res., 36, 15-40. 

2. 36 plots, each -f G a. T.A. 2 a. B. 

3 years 1930-31-32. 

Metzger, W. H. The relation of varying 
rainfall to soil heterogeneity as measured by 
crop production. J. Amer. Soc. Agron., 27, 
274-78. 

3. 175 plots, each 13*2' x 13*2' = ^50 a* G. 
T.A. fa. 

Summerby, B. The value of preliminary 
uniformity trials in increasing the precision 
of field experiments. Macdonald Coll . Tech. 
Bull. 15. 



238 Cochran —A Catalogue of Uniformity Trial Data . [No. 2, 


Apples. 


Barley. 


4. 512 individual tree yields. 6. 

Batchelor, L. D., and Reed, H. S. Relation 
of the variability of yields of fruit trees to the 
accuracy of field trials. J. Agric. Res., 12,245- 
83. 

5. 50 individual tree yields. E. 

Collison, R. C., and Harlan, J. P. Vari¬ 
ability and size relations in apple trees. New 
York (Geneva), Agr. Exp. Sta. Tech . Bull. 164, 
1-38. 

Yields filed at the New York State Agri¬ 
cultural Experiment Station, Geneva, N.Y. 

6. 187 individual tree yields. G. 

Strickland, A. G. Error in horticultural 
experiments. J. Dept. Agric. Victoria, 1935, 

32, 408-16. 

Time in weeks for a stored apple to reach 5 
per cent, waste and 5 per cent, breakdown. 

7. 390 plots, each 4' x 4' = T0 ? jnr a. T.A. -J a. G. 

Bose, R. D. Some soil heterogeneity trials 
at Pusa and the size and shape of experimental 
plots. Ind. J. Agric. Sci., 5, 545. 

8. 2304 plots, each 3' X 3' = a. T.A. \ a. R. 

Goulden, C. H. Unpublished data. 

9. (1) 30 plots, each a. T.A. \ a. G. 

(2) 128 plots, each a. T.A. 1 a. 

Hanson, N. A. Provedyrkning paa Forsogs- 
stationen ved Aarslev. Tids. for Landbrugets 
Planteavl., 21, 553. 

10. 46 plots, each 23J' x 317' = £ a. T.A. 8 a. G. 

(1) 1912-14, Harris, J. A., and Scofield, C. S. 
Permanence of differences in the plats of an 
experimental field. J. Agric. Res., 20, 335-56. 

(2) 1922-23-24. 

Further studies on the permanence of differ¬ 
ences in the plats of an experimental field. J . 
Agric. Res., 36,15-40. 

11. 234 plots, each 24£' x 34£' = -fo a. T.A. 

4£a. G. 

Kristensen, R. K. Anlaeg og Opgrelsa of 
Marksforsq. Tids. for Landbrugets Planteavl 

31 . 



1937] Cochran— A Catalogue of Uniformity Trial Data. 239 


Cacao. 


Clover. 


Coconuts. 


Coffee. 

Corn. 


12. 96 plots, each 3*3' x 3*3' = a. T.A. a. G. 

N contents given, but not yields. 

Barbacki, St. Mlmoires de Vlnstitut National 
Polonais d’Econonbice Rural, a Putawy , T. XIV, 
No. 213. 

13. 500 trees : yields in pods. N. 

Cheesman, E. E., and Pound, F. J. Uni¬ 
formity trials on Cacao. Trop. Agric ., 9, 
277-88. 

14. Pound, F. J. Unpublished data of several 
thousand trees since 1914. 

15. 35 plots, each 13*2' x 66 ' = fa a. T.A. f a. 

Each year 1928-32 on different parts of the 
same field. 

Summerby, R. The value of preliminary 
uniformity trials in increasing the precision 
of field experiments. 

16. 60 plots, each of 6 trees. T.A. 12 a. G. 

Joachim, A. W. R. A uniformity trial with 
coconuts. Tropical Agriculturist , 85 , 4, 198- 
207. 

Yields of nuts over 8 months. 

17. 44 plots, each of 25 palms. 

Yields each year from 1919 to 1928. 

Beckett, W. H. R. 

Randomization in Field Experiment and its 
application on experiment stations. Bull . No. 

20, Dept, of Agric., Gold Coast , number of nuts 
given. 

18. 12,000 individual tree yields of cherry for each R. 
of 3 years. Gilbert, S. M. Unpublished data. 

19. 3 trials, 2304 plots, each 1 hill x 1 row (1) 1923, 

(2) 1925, (3) 1925. T.A. f a. E. 

Bryan, A. A. Factors affecting experi¬ 
mental error in field plot tests with corn. 
Iowa Agric. Expt. Sta. Report , 1930-31, 67. 
Individual yields filed with Iowa Agric. Exp. 
Sta. 

20. 450 plots, each 21' x 68 ' (3J' discard all 

round) == ^ a. T.A. 9 a. G. 

Garber, R. J., Mcllvaine, J. C., and Hoover, 

M. M. A method of laying out experimental 
plats. J. Amer. Soc. Agron., 23, 286-98. 


P W 



240 Cochrax —A Catalogue of Uniformity Trial Data . [No. 2, 


Cotton. 


21. 46 plots, each 23J' X 317' = ^ a. T.A. 8 a. G. 
1915-16. 

Harris, J. A., and Scofield, C. S. 
Permanence of differences in the plats of an 
experimental field. J. Agric. Res,, 20, 335-56. 

22. 36 plots, each a. T.A. 1J a. E. 

Metzger, W. H. The relation of varying rain¬ 
fall to soil heterogeneity as measured by crop 
production. J. Amer. Soc. Agron., 27, 274-78. 

23. 438 plots, each 1 row x 66' = T J-y a. T.A. 

2J a. G. 

McClelland, C. K. Some determinations of 
plot variability. J. Amer. Soc. Agron., 18, 
819-23. 

24. 120 plots, each T V a. T.A. 12 a. G. 

Smith, L. H. Plot arrangement for variety 
experiments with corn. Proc. Amer. Soc. 
Agron., 1, 84-89. 

25. 5 trials, each of about 160 plots, each 20 

ridges X 7 metres — fjj a. T.A. 4 a. G. 

Bailey, M. A., and Trought, T. An account 
of experiments carried out to determine the 
experimental error of field trials with cotton 
in Egypt. Min. Agric. Egypt Tech, and, Sc. 
Service Bull. 63. 

26. 490 plots, each 1 row X 21' = ^ a. T.A. 

y a. E. 

Ducker, H. C. Unpublished data. 

27. (1) 200 plots, each V x 24' = y^ a. 

T.A. i a. N. 

(2) 200 plots, each 4£' X 16' = -$h) a * 
T.A. J a. 

Fu Siao. Uniformity trials with cotton. J. 
Amer. Soc. Agron., 27, 12. 

28. 3696 individual plants. E* 

Heath, 0. V. S. Unpublished data. 

Height, node number and dry matter of 
individual cotton plants. 

29. 1280 plots, each 4 rows x 4*8' = T<J \nr a. 

T.A. | a. E. 

Hutchinson, J. B., and Panse, V. G. 
Studies in the technique of field experiments. 

I. hidian J. Agric. Sci., 5 , 523-38. 



1937] Cochran —A Catalogue of Umfoimity Trial Data. 241 


Fodder 

Corn. 


Grapes. 


Hops. 


Lemons. 


Lentils. 


Maize. 


30. (1) 576 plots, each 3*5' X 40'= ff J 0 a. T.A. 

2 a. 

(2) 576 plots, each 3*5' X 30' = ^ a. 
T.A. 1 j a. R. 

MacDonald, D. Unpublished data. 

31. 480 plots, each 1 row X 25' = a. T.A. 1 a. R. 

McKinstry, A. H. Unpublished data. 

32. (1) 300 plots, each 3' X 48' = a. T.A. 1 a. R. 

(2) 700 plots, each 3J' X 47' = a. T.A. 

2| a. 

Reynolds, E. B., Killough, D. T., and Van- 
tine, J. T. 

Size, shape and replication of plats for field 
experiments with cotton. J. Amer. Soc. 
Agron., 26, 725-34. 

33. 63 plots, each 15' X 112J' = a. T.A. 2| a. G. 

Morgan, J. O. Some experiments to deter¬ 
mine the uniformity of certain plats for field 
tests. Proe. Amer. Soc. Agron., 1, 58-67. 

34. 200 vines, 8' apart in rows 10' apart. G. 

Strickland, A. G., Forster, H. C., and Vasey, 

A. J. A vine uniformity trial. J. Agric. of 
Victoria , 30 , 584. 

35. 30 plots, each 1 row x 210'. Yields each year. 

1909-14. G. 

Stockberger, W. W. Relative precision of 
formulae for calculating normal plot yields. 

J. Amer. Soc. Agron., 8, 167-75. 

36. 364 individual tree yields. G. 

Batchelor, L. D., and Reed, H. S. Relation 
of the variability of yields of fruit trees to the 
accuracy of field trials. J . Agric. Res., 12, 
245-83. 

37. 390 plots, each 4' x 4' = a. T.A. J a. G. 

Bose, R. D. Some soil heterogeneity trials 
at Pusa and the size and shape of experimental 
plots. Ind. J. Agric. Sci ., 5, 545. 

38. 83 plots, each 33' x 33' = ^ a. T.A. 2$ a. G. 

Beckett, W. H., and Fletcher, S. R. B. A 
uniformity trial with maize. Gold Coast Dept. 
Agric. Bull. 16, 222-26. 

Germination and ear number counts given. 

Yields measured for 15 plots only. 



242 Cochran —A Catalogue of Uniformity Trial Data . [No. 2, 


39. 300 plots, each 1 row x 60'. R. 

Saunders, A. R. Statistical methods with 
special reference to field experiments. Union 
of South Africa , Dept, of Agric. and Forestry, 
Science Bull . 147. 

40. 250 plots, each 1 row x 10 plants. R. 

Saunders, A. R. Unpublished data. 

41. (1) 175 plots each 13-2' x 13-2' = a. 

T.A. | a. G. 

Yields each year from 1922-26. 

(2) 35 plots, each 13*2' x 66 ' = fa a. 
T.A. f a. 

Yields each year from different ranges 1927, 
1928, 1929,1930,1931, 1932. 

Summerby R. The value of preliminary 
uniformity trials in increasing the precision of 
field experiments. 

Mangolds. 42. 30 plots, each ^ a. T.A. £ a. G. 

Hanson, N. A. Pr0vedyrkning paa Forsogs- 
stationen ved Aarslev. Tids. for Lanibrugets 
Planteavl. , 21, 553. 

43. 200 plots, each 3 rows x 30J' = -zfoj a. T.A. 

1 a. G. 

Mercer, W. B., and Hall, A. D. The experi¬ 
mental error of field trials. J. Aqric. Sci4 . 
107-132. 

44. (1) 175 plots, each 13*2' x 13*2' = a. 

T.A. f a. G. 

(2) 150 plots, each 13*2' x 13*2' = ^ a. 
T.A. \ a. 

Summerby, R. The value of preliminary 
uniformity trials in increasing the precision of 
field experiments. 

45. 1050 plots, each a * T.A. 1 a. N. 

Wood, T. B., and Stratton, F. J. M. The 
interpretation of experimental results. J. 
Agric. Sci., 3, 417-40. 

Millet. 46. 105 plots, each ^ a * T.A. J a. G. 

Lehmann, A. Report of Agricultural 
Chemist. Dept, of Agric. Mysore State , 
1900-7. Roemer, Th. Der Feldversuch. 
Arbeiten der Deutschen Landw . Gesellsckaft ., 
302. 



1937] Cochkan —A Catalogue of Uniformity Trial Bata . 243 


Mushrooms. 


Oats. 


47. 600 plots, each 1' X 15' = ^Vrr a. T.A. J a. G. 

Li, H. W., Meng, C. J., and Liu, T. N. 
Field results in a millet-breeding experiment. 

48. (1) 50 plots each 2' x 5'== 4h \ v a. T.A. r ^a. G. 

(2) 50 plots each 4' X 6 ' = T hV f a. T.A. 

iV a * 

(3) 40 plots each 4' X 6 ' = -ysVit a * T.A. 
xV a - 

Lambert, E. B. Size and arrangement of 
plots for yield tests with cultivated mush¬ 
rooms. J. Agric. Res., 48, 1971-80. 

49. (1) 66 plots, each J a. T.A. 17 a. E. 

(2) 68 plots, each J a. T.A. 17 a. 

Farrell, F. D. Interpreting the variability 
of plat yields. U.S . De'pt. of Agric. Bureau of 
Riant Industry Circular No. 109, 27-32. 

50. 295 plots, each 21' x 68 ' = ^ a. T.A. 6 a. G. 

Garber, E. J., Mcllvaine, T. C., and Hoover, 

M. M. A study in soil heterogeneity in experi¬ 
ment plots. J. Agric. Res., 33. 255-68. 

51. 450 plots, each 21' x 68 ' (3£ feet discard all 

round) = a. T.A. 9 a. G. 

Garber, E. J., Mcllvaine, T. C., and Hoover, 

M. M. A method of laying out experimental 
plots. J. Amer. Soc. Agron ., 23, 286-98. 

52. (1) 200 plots (3 yields missing) each — a. G. 

(2) 300 plots (3 yields missing) each = ^ a. 

Gorski, M., and Stefaniow, M. Die Anwend- 
barkeit der Wahrscheinlichkeitsrechnung bei 
Feldversuchen. Landw. VersuchSstationen , 90, 
225-40. 

53. (1) 30 plots, each ^ a. T.A. \ a. G. 

(2) 128 plots, each a. T.A. 1 a. 

Hanson, N. A. Provedyrkning paa Forsogs- 
stationen ved Aarslev. Tids. for Landbrugets 
Planteavl., 21, 553. 

54. 46 plots, each 23J' x 317' — £ a. T.A. 8 a. G. 

1917. Harris, J. A., and Scofield, 0. S. 
Permanence of differences in the plats of an 
experimental field. J. Agric . Res., 20, 335-56. 

55. 46 plots, each 23J' X 317' = £ a. T.A. 8 a. E. 

Same trial as No. 54. 1911, total produce 
only. 



244 Cochran —*4 Catalogue of Uniformity Trial Data. [No. 2, 


56. 207 plots, each a. T.A. 7 a. G. 

Kiesselbach, T. A. Studies concerning the 
eli min ation of experimental error in com¬ 
parative crop tests. Res. Bull. Nebraska Agric. 

St at., 13,1-95. 

57. 36 plots, each a. T.A. a. E. 

Metzer, W. H. The relation of varying 
rainfall to soil heterogeneity as measured by 
crop. J . Amer. Soc. Agron ., 27, 274. 

58. 24 plots, each 33' X 132' = a. T.A. 2£ a. G. 

McClelland, C. K. Some determinations of plot 
variability. J. Amer. Soc. Agron., 18, 819-23. 

59. 240 plots, each a. T.A. J- a. G. 

Eoemer, Th. Der Feldversuch. Arbeiten 
der deutschen Landw. Gesellschaft , 302. 

60. 48 plots, each a. T.A. 5 a. G. 

Roth . Exp . Sta. Report , 1927-28, p. 153. 

61. 512 plots, each V X 15' = .^Vcr a * T.A. J a. G. 

Summerby, E. A study of size of plats, 
numbers of replications, and the frequency and 
methods of using cheek plats in relation to 
accuracy in field experiments. J. Amer. Soc. 
Agron., 17, 140-50. 

62. (1) 175 plots, each 13*2' x 13*2' = ^ a. T.A. 

| a. G. 

Yields each year 1922-26 and 1924-25-26. 

(2) 35 plots, each 13-2' x 66' = a. T.A. 

J a. Each year on different ranges from 1927 
to 1932. 

Summerby, E. The value of preliminary 
’ uniformity trials in increasing the precision of 
field experiments. 

63. 124 plots, each 33' x 132' = a. T.A. 

m a. ^ G. 

Wyatt, F. A. Variation in plot yields due to 
soil heterogeneity. Sci. Agr ., 7, 248-56. 

Oranges. 64. (1) 1000 individual tree yields. G. 

(2) 495 individual tree yields. 

(3) 240 individual tree yields. 

Batchelor, L. D., and Eeed, H. S. Eelation 

of the variability of yields of fruit trees to the 
accuracy of field trials. J. Agric. Res., 12, 
245-83. 



1937] Cochban —A Catalogue of Uniformity Trial Data . 245 


Paddy. 


Pasture. 


Peaches. 


Pineapples. 


Potatoes. 


65. 193 plots, each of 8 trees, yields given each year 

from 1921 to 1927. G. 

Parker, E. D., and Batchelor, L. D. Varia¬ 
tion in the yields of fruit trees in relation to the 
planning of future experiments. Hilgardia 7, 

No. 2,1932. 

66. (1) 104 plots, each 6*6' X 122' = -^ a. T.A. 

2 a. N. 

(2) 72 plots, each 6-6' x 174' = a. T.A. 

2 a. 

Lord, L. Irrigated paddy : a contribution 
to the study of field plot technique. Agric. J. 
India, 19, 20-27. 

67. 760 plots, each 6*6' X 3*3' = ■ t >- Q \ JTr a. T.A. \ a. G. 

Davies, J. G. The experimental error of the 
yield from small plots of natural pasture. 
Council Sci. and Indust. Res. (Aust .) Bull. 48. 

68. 144 individual tree yields. G. 

Strickland, A. G. Error in horticultural 
experiments. «7. Dept. Agric . Victoria, 33, 
408-16. 

69. (1) 24 plots, each 4 rows x 75'. G. 

(2) 24 plots, each 4 rows X 75'. 

(3) 25 plots, each 4 rows x 60'. 

Magistad, 0. C., and Earden, C. A. Experi¬ 
mental error in field experiments with pine¬ 
apples. J. Amer. Soc. Agron ., 26, 631-44. 

70. 750 single-row plots. E. 

Jakowski, Z. Unpublished data, see Ney- 
man, J. Statistical problems in agricultural 
experimentation. J. Roy. Stat. Soc. Suppl., 2, 

2,107-54. 

Yields filed with J. Neyman. 

71. 618 plots, each 2*2' x 33-5' = a. T.A. 1 a. E. 

Justesen, S. H. Influence of size and shape of 
plots on the precision of field experiments 
with potatoes. J. Agric. Sci., 22, 366-72. 

Data filed with the N.I.A.B., Cambridge, 
England. 

72. 576 plots, each 3' x 22' = a. T.A. 1 a. G. 

Kalamkar, R. J. Experimental error and 
the field plot technique with potatoes. J. 
Agric. Sci., 22, 373-85. 



246 Cochkan —A Catalogue of Uniformity Trial Data . [No. 2, 


Eagi. 


Eice. 


73. 204 plots, each 2^ x 72^ — htu a. T.A. 1 a. G. 

Lyon, T. L. Some experiments to estimate 
errors in field plat tests. Proc. Amer . Soc . 
Agron., 3, 89-114. 

74. 720 plants, every fifth hill missing. G. 

Stewart, F. Missing hills in potato fields : 
their effect upon the yield. New York Agric. 
Exp. Sta. Bull . 459, 45-69. 

75. 4 sets, each 3' x 15' = a. N. 

(1) 1000 plats. (2) 1560 plats. (3) 2000 
plats. (4) 1000 plats. 

Thompson, E. C. Size, shape, etc., in sweet 
potatoes field-plot experiments. J. Agric. 
Res., 48, 379-99. 

76. 51 plots, each Jq- a. T.A. 2| a. N. 

Westover, K. C. The influence of plat size 
and replication on experimental errors in field 
trials with potatoes. West. Virginia. Agr. 
Expt. Sta. Bull. 189. 

77. 34 plots, each -jw a. T.A. a. G. 

Yields for 4 years 1905-8. 

Lehmann, A. Report of Agric. Chemist . 
Dept, of Agric. Mysore State, 1900-7. See also 
Eoemer, Der Feldversuch. 1st Ed. 

78. (1) 144 plots each 5' X 5' = T7 1 T ?y a. T.A. 

T5 a. N. 

(2) 144 plots each 5' X 5' = a. T.A. 

A* 

Plots arranged in a 12 x 12 Latin square. 

Bose, S. S., Ganguli, P. M., and Mahalanobis, 

P. C. The frequency distribution of plot 
yields and the optimum size of plots in a uni¬ 
formity trial with rice in Assam. Indian J. 
Agric. Sci., 1936, 6 part 5, pp. 1107-22. 

79. 3 series of 100 plots, each 1£' x 14-2' = a. 

T.A.^a. “ - N. 

Chien-Liang-Pan. Uniformity trials with 
rice. J. Amer. Soc. Agron., 27, 279. 

80. 54 plots each 33' x 33' = ^ a. T.A. 

li a. G. 

Coombs, G. E., and Grantham, J. Field 
experiments and the interpretation of their 
results. Agr. Bull. Fed. Malay States , 4. 



1937] Cochran —A Catalogue of Uniformity Trial Data . 247 


Rubber. 


Rye. 


Seeds. 


Silage corn. 


81. 300 plots, each 1-5' X 14*25' — irnnr a. T.A. 

$ a. . N. 

Li-Ying-Shen. Statistical analysis of a 
blank test of rice with suggestions for field 
technique. AgricuUura Sinica , 1934, 1 No. 

4, pp. 107-50. 

82. 560 plots, each 10' X 10' = a. T.A. 1J a. G. 

Lord, L. A uniformity trial with irrigated 
broadcast rice. J. Agric. Sci., 21,178-86. 

83. Plots 3' x 3' = a. N. 

Mitra, S. H., and Ganguli, P. M. A uni¬ 
formity trial in rice. Proc. 21st Annual Indian 
Sci. Congress Bombay, 1934, 71. 

84. 280 plots, each 2 rows x 10 plants. N. 

Parnell, F. R. Experimental error in variety 
tests with rice. Ague . J. India, 14, 747-57. 

See also No. 66. 

85. 1000 trees yields foT each of 10 years. R. 

Murray, R. K. S. The value of a uniformity 
trial in field experimentation with rubber. J. 
Agric. Sci., 24,177-84. 

86. 161 trees each year from 1921-22 to 1924-25. G. 

Taylor, R. A. The inter-relationship of 
yield and the various vegetative characters in 
Hevea Brasilensia. Deft. of Agric. Ceylon Bull . 

77. 

, 87. (1) 30 plots, each a. T.A. J a. G. 

(2) 128 plots, each y£y a. T.A. 1 a. 

Hanson, N. A. Provedyrkning paa For- 
80gsstationen ved Aarslev. Tids. for Land - 
brugets Planteavl., 21, 553. 

88. 128 plotB, each a. T.A. 1 a. G. 

Hanson, N. A. Pr0vedyrkning paa Forsogs- 
stationen ved Aarslev. Tids. for Landbrugets 
Planteavl., 21, 553. 

89. 46 plots, each 23J' x 317' = £ a. T.A. 8 a. G. 

1918. Harris, J. A., and Scofield, C. S. 
Permanence of differences in the plats of an 
experimental field. J . Agric. Res., 20, 335-56. 
1920,1925. 

Further studies on the permanence of differ¬ 
ences in the plats of an experimental field. J. 
Agric. Res., 36,15-40. 



248 Cochran — A Catalogue of Uniformity Trial Data. [No. 2, 


Sorghum. 90. 160plots,each a.for 1930-31-32. T.A.la. 6. 

Kulkarni, E. K., Bose, S. S., and Mahala- 
nobis, P. C. The influence of shape and size of 
plots on the effective precision of field experi¬ 
ments with sorghum. Indian J . Agric. Sci., 

6, 460-74. 

91. 2000 plots, each 1 row x 1 rod = ^ a. T.A. 

2£ a. " G. 

Stephens, J. C., and Vinall, H. N. Experi¬ 
mental methods and the probable error in field 
experiments with sorghum. J. Agric. Res., 

37, 629-46. 

92. 400 plots, each 3*3' X 33' = a. T.A. 1 a. N. 

Swanson, A. F. Variability of grain 
sorghum yields as influenced by size, shape and 
number of plats. J. Amer. Soc. Agron ., 22, 
833-38. 

Sorgo. 93. 36 plots, each a. T.A. If a. E. 

2 years, 1932-33. 

Metzger, W. H. The relation of varying 
rainfall to soil heterogeneity as measured by 
crop production. J. Amer. Soc. Agron., 27, 
274-78. 

Soy beans. 94. 30 plots: artificially constructed in frames, 

each 4|' x 9£' = a. T.A. a. G. 

Garber, E. J., and Pierre, W. H. Variation 
of yields obtained in small artificially con¬ 
structed field plats. J. Amer. Soc. Agron., 25, 
98-105. 

95. (1) 882 plots, each 1 row x 8 ' — a. T.A. 

J a. G. 

(2) 1540 plots, each 1 row x 8 ' = &• 

T.A. | a. 

Odland, T. E., and Garber, E. J. Size of plot 
and number of replications in field experiments 
with sov beans. J. Amer . Soc. Agron., 20, 
94^108." 

Strawberries. 96. (1) 120 plots, each 4' x 68 ' = y Ju a. T.A. f a. N. 

(2) 80 plots, each 4' x 34' = ^4^ a. T.A. J a. 

Wilcox, A. N. A study of field plot tech¬ 
nique with strawberries. Scientific Agriculture, 

8,171-74. 



1937] Cochran — A Catalogue of Uniformity Trial Data. 249 

Sugar-beet, 97. 4G plots, each 23|' X 317' = | a. T.A. 8 a. G, 

Harris, J. A., and Scofield, C. S. Perman¬ 
ence of differences in tbe plats of an experi¬ 
mental field. J. Agric. Res., 20, 335-56. 

98. 600 plots, each 1 row x 33' = 7 a. T.A. 1 a. G. 

Immer, F. E. Size and shape of plots in 
relation to field experiments with sugar-beets. 

J. Agric. Res., 44, 649-68. 

99. 600 plots, each 1 row X 33' = a. T.A. 1 a. E. 

Immer, F. E., and Ealeigh, S. M. Further 
studies of size and shape of plot in relation to 
field experiments with sugar-beet. J. Agric. 
Res., 47, 591-98. 

100. 416 plots, each 8' X 135' = T V a. T.A. 10$ a. G. 

Eoemer, Th. Der Feldversuch. Arbeiten 
der deutscken Landw. Gesellschaft, 302. 

101. 96 plots, each 1 row x 55-8'. G. 

Two sets, 1916 and 1918. 

Eoemer, Th. Der Feldversuch. Arbeiten 
der deutschen Landw. Gesellschaft , 302. 

Sugar cane. 102. 49 plots, each a. T.A. 1 a. E. 

Barbados, 1927. 

103. 48 plots, each 30' x 75' = y 1 ^ a. T.A. 2$ a. G. 

Borden, E. J. Eeplications of plot treat¬ 
ments in field experiments. Hawaiian Plan¬ 
ters' Record, 34, 151-55. 

104. 203 trials each of 36 plots. E. 

Demandt, E. Die Eesultaten der Blanco- 
Proeven met 2878 PoJ van Oogstjaar 1931. 
Archief voor de Suikerindustrie in Neder- 
landsch-Indie Deel III. Med. van het Proef- 
station voor Java Suikerindustrie Jahrgang, 
1932, 14. 

Yields filed at the Proefstation voor de Java. 

Suikerindustrie, Soerabaia, Java. 

105. Yields of 1200 individual stools. G. 

Evans, H. Some preliminary data con¬ 
cerning the best shape and size of plot for field 
experiments with sugar cane. Dept, of Agric. 
Mauritius . Sugar Cane Research Station 
Bull. 3. 

106. 710 plots, each 5' X 50' = T J T a. T.A. 6 a. E. 

H. Evans. Unpublished data. 



250 Cochran — A Catalogue of Uniformity Trial Data. [No. 2, 


107. (1) 960 plots, each 3 X30£ —T.A.l^a. B. 

(2) 1088 plots, each 3' x 60' = J-j a. T.A. 
a. G. 

Wynne Sayer, Vaidyanathan and Subramaria 
Iyer. Ideal size and shape of sugar-cane 
experimental plots based upon tonnage experi¬ 
ments with Co 205 and Co 213 conducted in 
Pusa. Indian J . Agric. Sci ., 1936, 6. 

108. 968 plots, each 3' X 60' = jl? a. T.A. 4 a. G . 

Wynne Sayer and Krishna Iyer. On some 
of the factors that influence the error of field 
experiments with special reference to Bugar 


cane. Indian J. Agric. Sci., 1936, 6, 917. 

Swedes. 109. 48 plots, each a. T.A. 5 a. G. 

Roth. Exp. Sta. Report , 1925-26. 

Boots, Tops and Plant number given. 

Tea. 110. 144 plots, each At a. T.A. 2 a. G . 

Eden, T. Studies in the yield of tea. J. 
Agric. Sci., 21, 547-73. 

Yields and dry matter at 94° C. given. 

111. 24 plots. G. 


Vaidyanathan, M. The method of co- 
variance applicable to the utilization of the 
previous crop records for judging the improved 
precision of experiments. Ind. J. Agric. Sci., 

4, 327-42. 

Timothy 112. 240 plots, each 16J' X 16£' = a. T.A. 
hay. 1£ a. G. 

Holtsmark, G. U., and Larsen, B. E. Tiber 
die Eehler, welche bei Peldversuchen durch die 
Ungleichartigkeit des Bodens bedingt werden. 
Landw . der Versuchsstationen , 65, 1-22. See 
also Boemer, Th., der Feldversuch. 1st Ed. 

113. 35 plots, each 13*2' x 66 ' = a. T.A. f a. G. 

Each year 1929-32 on different parts of the 
same field. 

Summerby, B. The value of preliminary 
uniformity trials in increasing the precision of 
field experiments. 

Tomatoes. 114. 180 plots, each of 6 plants. G. 

Strickland, A. G. Error in horticultural 
experiments. J. Dept. Agric . Victoria , 32, 
408-16. 



1937] Cochran— A Catalogue of Uniformity Trial Data . 251 


Walnuts. 115. 320 individual seedling tree yields. G. 

Batchelor, L. D. ? and Heed, H. S. Relation 
of the variability of yields of fruit trees to the 
accuracy of field trials. Agric. Res ., 12, 
245-83. 

Wheat. 116. 390 plots, each 4' x 4' = a. T.A. J a. G. 

Bose, R. D. Some soil heterogeneity trials 
at Pusa and the size and shape of experimental 
plots. Ind. J. Agric. Sci. } 5, 545. 

117. (1) 288 plots, each 8" X 7J' = a. T.A. 

tV a * G. 

(2) 288 plots, each 8 ff x8' = a. T.A. 
v 1 - a. R. 

Christidis, B. G. The importance of the 
shape of plots in field experimentation. J. 
Agric. Sci., 21, 14-37. 

118. 3100 plots, each 8" X 5' = a. T.A. J a. N. 

Day, J. W. The relation of size, shape and 
number of replications of plots to probable error 
in field experimentation. J. Amer. Soe . Agron ., 

12,100-5. 

119. 160 plots, each 13*2' x 19-8' = a. T.A. 

la. G. 

Forster, H. 0., and Yasey, A. J. Experi¬ 
mental error of field trials in Australia. Vic¬ 
toria J. Dept. Agric ., 27, 385-95. 

120. 450 plots, each 21' x 68 ' (3|' discard all 

round) — a. T.A. 9 a. G. 

Garber, R. J., Mcllvaine, T. 0., and Hoover, 

M. M. A method of laying out experimental 
plots. J. Amer. Soc. Agron., 23, 286-98. 

121. 295 plots, each 21' x 68 ' (a border of 3f' all 

round rejected) = -^ a. T.A. 5f a. G. 

Garber, R. J., Mcllvaine, T. 0., and Hoover, 

M. M. A study in soil heterogeneity in experi¬ 
mental plots. J. Agric. Res., 33, 255-68. 

Yields obtained by sampling 5 rod rows. 

122. 30 plots, artificially constructed in frames each 

4§' X 9£\ ^ G. 

Garber, R. J., and Pierre, W. H. Variation 
of yields obtained in small artificially con¬ 
structed field plots. J. Amer. Soc. Agron., 25, 
98-105. 



Coohuax —A Catalogue of Uniformity Trial Data. [No. 2, 


123. 1280 plots, each A' X 1*6' = jrahoir a * T.A. 

tg a - G . 

Kalamkar, R. J. A study in sampling tech¬ 
nique ‘with wheat. J. Agric. Sci., 22 , 783-96. 

124. 500 plots, each 11 rows X 10*82' = a. 

T.A. 1 a. G. 

Mercer, W. B., and Hall, A. D. The experi¬ 
mental error of field trials. J. Agric. Sci., 4, 
107-32. 

125. 36 plots, each a * T.A. If a. R. 

Metzger, W. H. The relation of varying 
rainfall to soil heterogeneity as measured by 
crop production. J. Amer. Soc. Agron., 27, 
274-78. 

126. 224 plots, each 5|' X 5j' = T ^ rT a. T.A. 

J a. G. 

Montgomery, E. G. Experiments on wheat 
breeding. U.S . _Dep£. Bulletin Bureau of 'plant 
Industry Bull. 269. 

127. 63 plots, each 15' X 112J' = ^ a. T.A. 

2 f a. " G. 

Morgan, J. 0. Some experiments to deter¬ 
mine the uniformity of certain plots for field 
tests. Proc. Ayner. Soc. Agron 1 , 58-67. 

128. ( 1 ) Winter wheat. 240 plots, each yvVu a * 

T.A. ^ a. G. 

( 2 ) Summer wheat. 230 plots, each -pr 1 ^ a. 
T.A.£a. 

Roemer, Th. Der Eeldversuch. Arbeitcn 
der deiitschen Landw. Gesellsehaft , 302. 

129. 48 plots, each ^ a. T.A. 5 a. G. 

Roth. Exp. Sta. Report , 1925-26. 

130. 1080 plots, each 6 " X 1' = T.A. 

s\r a * R. 

Smith, H. E. Unpublished data. 

131. 360 plots, each 9 rows x 1 chain = T -Vo- a. 

T.A. 3 a. E. 

Waite Institute (Adelaide) Report, 1925-32. 

Yields filed at the Waite Institute. 

132. 1500 plots, each 1 row x 15' = a. T.A. 

i a, G. 

Wiebe, G. A. Variation and correlation in 
grain among 1500 wheat nursery plots. J . 
Agric. Res., 50, 331-57. 



1937] Cochran — A Catalogue of Uniformity Trial Data. 253 


133. 94 plots, each j a. T.A. 1 a. N. 

Wiener, W. T. 6., and Broadfoot, B. The 
amount of variability which may be expected 
to occur in a determination of comparative 
yields in small grains. Proc. Fifth Ann. 
Meetings Western Canadian Soc. Agr., 17-24. 

134. 124 plots, each 33' x 132' = & a. T.A. 12£a. G. 

Wyatt, F. A. Variation in plot yields due to 
soil heterogeneity. Sci. Agric., 7, 248-56. 




255 


INDEX 

TO THE 

INDUSTRIAL AND AGRICULTURAL 
RESEARCH SUPPLEMENT 

Vol. IV, 1937. 


PAGES 

Bartlett (M. S.). Some examples of statistical methods of research 
in agriculture and applied biology ...... 137-170 

Discussion : Mr. Go&set; Dr. Irwin; Dr. Wishart; Mr. Black¬ 
man; Mr. Cochran; Mr. Fairfield Smith; Mr. Bartlett in 

reply.170-183 

-. Sub-sampling for attributes.131-135 

Bayes (A. W.). Some considerations of the variability of cotton 

cloth strength. ......... 61-80 

Cloth stiength as a measme of quality . .... .61 

Description of cloth manufacture . .... .64 

Description of data aiul some pi oblems . .... ,69 

Consideration of two methods of routine sampling . . . .74 

Practical difficulties with null data . . . . . .78 

Discussion: Mr. Tippett; Prof. Pearson; Colonel Hidden; Major 
Myers; Mr. Cochran; Mr. Welch; Mr. Daniels; Mr. Gosset; 

The Chairman; Mr. Bayes in reply ..... 80-93 

Cochran (W. G.). Catalogue of uniformity trial data . . . 233-253 

——. Problems arising in the analysis of a series of similar experi¬ 
ments . 102-118 

Comrie (L. J.). The application of Hollerith Equipment to an 

agricultural investigation. 210-224 

Cotton cloth strength, considerations of the variability of. See 
Bayes (A. W.). 


Forest products research, statistical method in. See Van Best 
(E. D.). 

Hollerith Equipment, application of, to an agricultural investiga¬ 
tion. See Comrie (L. J.). 

Irwin (J. 0.). Statistical method applied to biological assays . 1-48 

Part i. Principles ......... 2 

Part u. Applications ........ 31 

Discussion: Prof. Gaddum; Dr. Coward;* Mr. Yates; Mr. 

Bartlett; Dr. Trevan; Prof. Bum; Dr. Irwin in reply; contri¬ 
bution from Dr. Neyman ....... 49-60 

Pearson (E. S.) and Welch (B. L.). Notes on some statistical 
problems raised in Mr. Bayes’s paper ..... 94-101 

Pitman (E. J. G.). Significance tests which may be applied to 

samples from any populations. 119-130, 225-232 

Problems arising in the analysis of a series of similar experiments. 

See Cochran (W. G.). * 








256 


Significance tests, etc. See Pitman (E. J. G.). 

Statistical method applied to biological assays. See Irwin (J. 0.). 

-method in forest products research. See Van Best (E. D.). 

-methods of research in agriculture and applied biology. See 

Bartlett (M. &). 

Sub-sampung for attributes. See Bartlett (M. S.). 

Uniformity trial data, catalogue of. See Cochran (W. G.). 

Van Rest (E. D.). Examples of statistical method in forest products 


research. 184-203 

Work of the Forest Products Research Laboratory .... 3,84 

Variability .......... 184 

Examples .......... 187 

Problems involving statistical reasoning ..... 200 

Discussion: Sir R. L. Robinson; Mr. Bartlett; Mr. W. R. Robert¬ 
son; Prof. Pearson; Mr. Clarke; Mr. Tippett; Miss Pettifor; 

Mr. Van Rest in reply. 203-209 


Welch (B. L.). See Pearson (E. S.) and Welch (B. L.). 









MGUC—S5—38 AR/54—7-7-54—7,000. 





